Projects and Datasets

Risk Assessments

The Risk Assessment is at the core of what we do. This article will explain what a Risk Assessment is and how we create one.

The Risk Score

The Risk Score is the worst-case probability than an attacker with external knowledge will be able to re-identify an individual in your Dataset. We calculate the Risk Score before and after each Run on a Dataset so that you can see the impact of our anonymization treatment and track your risk over time.

How it Works

Our risk model assumes an adversary has access to an external dataset containing direct identifiers and quasi-identifiers that overlap with your dataset. Examples include public data like property tax records or voter registration data, but also private or semi-private information like credit reporting data. Using external data in this way to re-identify individuals is called a Linkage Attack.

We perform thousands of simulated attacks against the target Dataset (yours), making pessimistic (conservative) assumptions about the size and similarity of the attacker’s data to yours, and base our scores on an estimated probability of success of identifying an arbitrary individual. This probability becomes the Risk Score, which can be considered an upper bound on the re-identification risk of a Dataset.

Distortion Metrics