Configuring Datasets

Use Case: Analytics

Unlock insights across the organization by anonymizing data in your warehouse.

Configuring Privacy Dynamics to treat data for analytics is simple and fast. Below are our suggestions for how to get started.

Redact Direct Identifiers

When setting up your project, keep the default handling of Direct identifiers: Redact. This will drop the fields containing names, email addresses, mailing addresses, etc., from the Destination data.

Lock Primary and Foreign Keys

Unless your application uses PII (like an email address) as a table's Primary Key, you should select Lock for the Primary Keys in your project to preserve referential integrity in your Destination data.

Depending on the data, you may want to Lock a table's Foreign Keys, as well. If Foreign Keys are not Locked, they may be Anonymized, which could cause their value to be swapped with another from your dataset. This may not matter in some cases (like a product_id in an order_items table), but it may be problematic in others.

Lock Response Variables

If you are preparing data for a specific analysis, select Lock for the response variables. This will eliminate distortion in your headline numbers, at the cost of focusing distortion in your input features.

Keep k Low

You can further maximize the utility of your data by using a lower value of k for micro-aggregation.

By default, k is just 2, which is as low as it goes. In some situations, like in datasets containing sensitive attributes, k should be higher to minimize risk, but there is always a trade-off between privacy and utility in the Destination data. Datasets with more records or fewer quasi-identifiers can handle a higher k; conversely, short and wide datasets may be quite sensitive to k. Care should be taken to strike the right balance and set k as low as possible to maximize analytical utility.

Further Reading

Use Case: Dev Environments