Anonymized Data for Development and Testing Environments

John Craft

10.18.22 · 3 min read

Anonymized Data for Development and Testing Environments

Software development practices have matured dramatically over the last several years. But in spite of all of the other advances, using realistic data for development and testing while complying with privacy laws remains out of reach for most organizations.

Production data in lower environments, worth the risk?

It can be tempting to give developers a dump of production data, but there are enormous risks involved in doing this. Nearly all data contains PII, and in addition to magnifying the risk of a breach, using PII for development violates a suite of data privacy laws (like GDPR and CCPA) and standards (like SOC2) that require data minimization.

On the other hand, providing your development teams with continuous, realistic, and trustworthy data allows them to test against difficult edge cases, bugs, and ever-changing application demands. Such an environment provides an extremely valuable test bench, so a lot of teams try to build their own, either by manufacturing fake data or writing scripts to scrub PII from production data. But anonymization is hard, and many home-grown systems fail to meet the accepted standards for de-identification. And of course, after the initial build, maintaining such a system and keeping the schema up-to-date is a non-trivial chore that can negatively impact schedules and team morale.

Introducing Development Data with Privacy Dynamics

Today, we’re excited to announce Privacy Dynamics for Development and Testing Environments. All you need to do is point Privacy Dynamics at your production database and it will generate an anonymized copy that will meet your organization’s data privacy and security requirements. You can get started with Privacy Dynamics with a free trial today at privacydynamics.io.

Getting Started, Step by Step

The steps below walk through setting up Privacy Dynamics for development and test environments. The only requirement is that the source data is accessible over the Internet (if that isn’t the case, get in touch with us to learn more about running Privacy Dynamics in your cloud environment).

First, you’ll need to establish connections to both the source and destination databases. Privacy Dynamics will read from the source database, process the data, and write anonymized data to the destination database you choose. You may want to use a read-replica of your Production DB as the source. Privacy Dynamics does not keep a copy of your data. In this example, we’ll use Postgres as both our source and destination databases.

Create connections for the source and destination databases.

Modal window showing the creation of a data connection to Postgres

Start the project wizard by clicking the “Anonymize” button on the top. Select the origin and destination connections and corresponding schemas. The screen will show a list of all the tables in the database. Click the “All” link in the “Treat” column. This will tell Privacy Dynamics to anonymize each table.
On the “Configure Treatment” step, choose the default option of treating all identifiers. This is the safest option for reducing the chance of reidentification and will handle both direct and indirect identifiers. Under the “Direct Identifiers” section, you can choose to replace direct identifiers with realistic data. This will maintain format consistency so your tests still pass, without disclosing any private information. Privacy Dynamics will also anonymize indirect identifiers like birth date and zip code by performing a proprietary k-member micro-aggregation process that prevents disclosing unique tuples of quasi-identifiers.

Modal window showing configuration options for treating data

On the “Create Project” step, set up the anonymization process to run on a schedule and give it a meaningful name. Selecting “Automatically create datasets” will tell your project to detect new tables added to the origin database and automatically output anonymized versions to the destination database, maintaining parity between your production and development database.

Modal window showing the creation of a project with the Automatically create datasets option checked

The anonymization process won’t take long. Other synthetic data and de-identification solutions can take hours or days to process your data, but we can treat millions of records in minutes. Once it finishes, you can use pg_dump and psql to copy the anonymized database to your local machine.

pg_dump -C -h remotehost -U remoteuser dbname | psql -h localhost -U localuser dbname

Real Data, Ready for Development

Sign up for a 14-day free trial and start anonymizing data for your development and testing environments today. We are always available for questions and technical support by emailing us at support@privacydynamics.io, or you can book a demo using this link.