Anonymization is hard. While some of the basic concepts are easy to understand, creating a system that can robustly, precisely, and automatically anonymize diverse data has been a subject of active research for decades. The Privacy Dynamics anonymizer builds on this research while using new machine-learning-based approaches. In this section, we share a little about how the anonymizer works.
There are basically five steps to anonymize your data:
- Entity Recognition: the anonymizer needs a semantic understanding of your data. In this step, it automatically classifies your data into semantic categories (including PII like names, emails, and addresses), and detects any fields that contain categorical data.
- Risk Assessment: the anonymizer simulates linkage attacks against your raw data to develop a pre-treatment Risk Score.
- Treatment Planning: Based on its findings and your configuration, the anonymizer develops a plan for treating direct and quasi-identifiers to reach your desired risk target.
- Direct Identifier Suppression: the anonymizer suppresses direct identifiers based on your configuration
- k-member Micro-aggregation: the anonymizer forms clusters of k members and perturbs values as necessary to reduce uniqueness within each cluster, per the treatment plan.
- Analysis: we compute Dataset metrics on the treated data, including the post-treatment Risk Score and Distortion metrics.
For more information, see the other articles in this section.