Configuring Datasets

Use Case: Development and Preview Environments

Enable environment parity by using anonymized Production data in Dev, Preview, and QA environments.

Configuring Privacy Dynamics to treat data for lower environments is simple and fast. Below are our suggestions for how to get started.

Replace Direct Identifiers with Fake Data

When setting up your project, change the default handling of Direct identifiers to Realistic. This will replace names, email addresses, mailing addresses, and more with randomly-generated but format-preserving values.

A screenshot of the Configure Treatment modal showing that Realistic has been selected

Lock Primary and Foreign Keys

Unless your application uses PII (like an email address) as a table's Primary Key, you should select Lock for the Primary Keys in your project to preserve referential integrity in your Destination data.

Depending on the data, you may want to Lock a table's Foreign Keys, as well. If Foreign Keys are not Locked, they may be Anonymized, which could cause their value to be swapped with another from your dataset. This may not matter in some cases (like a product_id in an order_items table), but it may be problematic in others.

Increase k

You can further minimize the risk of your data by using a higher value of k for micro-aggregation.

By default, k is just 2. Choosing a larger number (for example, 5 or 25) will further minimize the risk of re-identification in your Destination data. A larger k will increase distortion, but the resulting lower risk and faster treatment are more important for most development data. Larger datasets (with more rows) can tolerate a higher k.

Further Reading

Use Case: Analytics