How We Anonymize Your Data

Entity Recognition

Proper anonymization requires a semantic understanding of the data. It is insufficient to know the types or statistical qualities of a dataset—we must infer the meaning of data in order to treat it appropriately. We call this process of semantic enrichment Entity Recognition.

There are two categories of data that are especially important to our algorithms: PII and numerical data.

Detecting PII

Personally Identifiable Information ("PII") can take many forms. HIPAA enumerates 18 families of attributes that are considered personal identifiers. They include names, email addresses, and phone numbers, but also dates, geographies, account numbers, biometrics, and more.

Detecting PII is a non-trivial problem, even in structured data. Privacy Dynamics combines several approaches, including those based on academic research and open-source libraries, encompassing rules-based heuristics and machine learning models. After detecting PII, our anonymizer creates a treatment plan to redact direct identifiers and treat quasi-identifiers.

Privacy Dynamics can automatically detect the following types of direct and quasi-identifiers:

IdentifierCategory
Person Name (Full, First, Middle, Last)Direct
Email AddressDirect
Phone NumberDirect
Mailing AddressDirect
ABA Routing NumberDirect
US Bank Account NumberDirect
Credit Card NumberDirect
Crypto AddressDirect
US Passport NumberDirect
US Driver's License NumberDirect
US Individual Taxpayer Identification Number (ITIN)Direct
US Social Security Number (SSN)Direct
Other US Tax ID's (EIN, PTIN)Direct
IP AddressDirect
Advertising IDDirect
Equipment Serial Number (ESN) in Decimal or HexDirect
International Mobile Equipment Identity (IMEI) NumberDirect
MAC AddressDirect
Mobile Equipment Identifier (MEID)Direct
AgeQuasi
Date of BirthQuasi
Date of DeathQuasi
Other Dates and TimesQuasi
GenderQuasi
SexQuasi
Coordinates (Full or Partial)Quasi
CityQuasi
US StateQuasi
US ZIP CodeQuasi
NationalityQuasi

To handle edge cases, PII treatment for your Dataset is configurable. For more information, see Configuring a Dataset.

Classifying Numerical Variables

Fields that have numerical types (like integers or floats) can also be used to encode categories. When anonymizing data, it is essential to know whether a numerical field is semantically discrete, continuous, or categorical.

After first attempting to detect PII in a numerical field (such as phone, account, social security, and other numbers), we profile each numerical field using a range of statistical methods. That profile is then fed into a machine-learning model that determines the probability that a field is categorical. This categorical classification is an important input into the anonymizer's treatment plan.

Previous
Overview
Next
Direct ID Suppression