Development Preview Environments with Argo CD, Neon and Privacy Dynamics

Brett Westover

02.16.24 · 8 min read

Development Preview Environments with Argo CD, Neon and Privacy Dynamics

Ephemeral test spaces connected to anonymized data give developers a powerful tool for debugging, fine-tuning, and improving apps.

When developers need to test their work to verify the functionality of new features without affecting production or ongoing development projects, isolated preview environments that mimic the main application's behavior and content can be super powerful.

Previews facilitate early detection of bugs and integration issues, as changes can be examined and vetted in a real-world setting. Moreover, ephemeral environments support continuous integration and delivery pipelines by enabling automated and on-demand environment provisioning. This flexibility enhances developer productivity, encourages experimentation without risk, and accelerates the overall development cycle, ensuring that updates and applications are robust and reliable before they reach end-users.

Let’s explore how to set up development preview environments using Argo CD, Neon, and Privacy Dynamics. We'll also discuss the benefits of each tool and how they work together to provide a seamless environment for developers.

Argo CD explained

Argo CD is an open-source, continuous delivery tool that provides automated deployment and lifecycle management for cloud-native applications. Argo CD integrates with Git repositories and uses a declarative approach to deploy applications, making it an ideal tool for creating ephemeral environments. Argo CD keeps the deployment on the cluster in sync with the desired state as defined by YAML-based manifests in the Git repository. With Argo CD, developers can simply push their code changes to the designated Git branch to trigger a deployment process to update an environment based on the configuration defined in the repository.

For the purposes of preview development spaces for dev teams, Argo CD can be configured to allow every change to be built and deployed in its own preview environment. This useful workflow can help spot mistakes, make collaborating with peers easier, and reduce the effort required to spin up test environments. Argo CD handles the specific configuration differences between dev and prod. But what about the application data? That's where Neon comes in.

What is Neon?

Neon is a serverless PostgreSQL database cloud hosting environment. Neon lets developers create copies or "branches" of a Postgres database in which each copy remains isolated from all the others. By leveraging the concept of branching—a structure familiar to most developers since it works a lot like Git version control—Neon allows developers to create ephemeral data instances for feature development, testing, or experimentation. These private, isolated database branches exactly mirror the source database and can be used in preview environments for thorough testing of database interactions, schema migrations, and more with zero risk to the actual production data and no impact on operational stability.

One caveat: a duplicated branch of the production database for testing and development is amazingly useful but also very risky, particularly when the production data is chock full of Personally Identifiable Information (PII). This includes specific and traceable information on users, whether patients, customers, partners, or any other identity. Such sensitive data must be protected, but the typical development preview environment lacks the strong protections needed to ensure privacy and compliance. Which brings us to the third leg of the ephemeral preview environment stool: Privacy Dynamics

Why Privacy Dynamics?

Privacy Dynamics unlocks production data by creating and maintaining an anonymized, PII-free replica of the production data environment, a privacy best practice known as Data Minimization.

Select data source in the anonymization wizard

Using Privacy Dynamics, Argo CD, and Neon together is a great option for teams looking to facilitate preview environments that leverage safe, discrete, de-identified clones of the production database. in a way that's efficient, private, secure, and easy to deploy.

Example: Building a preview environment using ArgoCD Neon and Privacy Dynamics

To show how this works, let's experiment using the anonymize-ecommerce-demo app. This is an API backend for an e-commerce application with users that have one or more shipping addresses. They can order products that ship to one of the addresses associated with their account. The relationship between orders, products, users, and addresses is critical for the app to function, so if we're going to remove sensitive PII, we'll need to preserve these relationships.

Modal window showing the creation of a data connection to Postgres

Prerequisites

The README has more details on how to set up this project, but in summary, we’ll need:

An Argo CD installation.
A database host. This method will work with any PostgreSQL host accessible to the app (e.g., RDS, CloudSQL, etc). It can also be set up to run as a container within the cluster. For this demo, we'll use Neon's serverless Postgres hosting. The sample data for our demo is in the repository.
An ApplicationSet—This instructs Argo CD to watch the repository for pull requests and to deploy a copy of the app in a dedicated namespace for each request. There is an example ApplicationSet YAML file in the repository.

Argo CD ApplicationSet YAML

Where trouble is lurking

A closer look at the production data for this app reveals the following tables:

users. Includes the user’s real full name and email address, as well as their date of birth and gender.
addresses. These are shipping addresses (Street, City, State, and Zip code). They might be business addresses, personal home addresses, or some other associated location.
orders. Records what products were ordered, the quantity, the price, and which address_id to ship it to.
products. A list of available products and associated pricing.

It's clear right from the outset that both the “users” and “addresses” tables have information that can identify users. This obvious PII will need to be addressed before this data can safely be used in a preview/test development environment. Let's work on that.

The true meaning of protection

To set up an anonymized copy of the production database, we'll create two connections in Privacy Dynamics, one to the data source and another to a destination database. In Neon, this results in a branch of the main production database that will be used to hold the anonymized copy.

With both connections created, we'll create a project in Privacy Dynamics with the following attributes:

Link the assets. Choose the source and destination connections we created above.
Identify the targets. Select all the tables to be treated by the Privacy Dynamics de-identification algorithm (we want to be able to use the entire database after all).
Select treatment options. Privacy Dynamics will automatically detect the PII data in “users” and “addresses.” Now we must decide how to treat that information. For this project, we'll opt to replace direct identifiers with realistic values. These direct identifiers will be replaced with fake values that match the format of the original data but otherwise have no relationship to the actual users.—names look like names, addresses look like addresses, you get the idea.
Tie up loose ends. We'll take an added precaution here, opting also to anonymize any indirect identifiers at risk. These are combinations of more generic fields that can possibly be combined with other records or external information that might allow an attacker to re-identify individuals in the dataset. Some examples include the City, State, and Zip Code in the "addresses" table, or the date of birth and gender in the "users" table. Privacy Dynamics analyzes these columns and ensures that, in combination, they don't represent a unique record. For example, if only one address is in a particular zip code, Privacy Dynamics will move it into a neighboring zip code to group it with another record. If only one 99-year-old male is in the database, the data will be modified to ensure the record is no longer unique. This micro-aggregation can be achieved with a surgical approach that minimally changes the data while significantly decreasing re-identification risk.

From Code Change to Preview

With safely anonymized data in hand, we're ready to start using it in our preview environments with Argo CD. If we were using a hosted database like RDS or CloudSQL, we would make sure the database was either public or at least accessible within the same network as our Argo CD instance. Since we're using Neon for this demo, we can use the public connection strings to grant access to our preview environments.

To get from proposed coding changes to a running preview with realistic data, we'll combine GitHub Actions, Neon, and Argo CD.

Opening a Pull Request (PR) triggers the aforementioned ApplicationSet to deploy the app. For now, this app lacks the updated code and instructions on accessing the appropriately de-identified database.

Subsequent triggering of the GitHub Actions workflow will drive the rest of the process, the steps of which include:

Build the code change into a new container image and push it to a registry.
Create a new database branch in Neon, with the anonymized branch as the PostgreSQL data source.
Leverage the Argo CD CLI to set both the container image tag and the database access URL as parameters for the app created and deployed by the ApplicationSet when the PR was opened.
Run the integration tests.

With a fully loaded, safely anonymized database and a full running copy of the app, we can now run much richer tests and reproduce issues more easily. The tests depend on realistic data and can reliably verify assumptions that the code is making, or ensure that previous bugs in the data model don't recur. Since we're working with a running app, we can see the impact of changes and easily share performance metrics and test results with team members for discussion and collaboration.

Preview environments and automated testing with Argo CD, Neon, and GitHub Actions unlock huge benefits on their own. Adding realistic representative data—taken straight from the production environment and thoroughly de-identified with Privacy Dynamics—imparts all the benefits of "developing in production" without the obvious risks to privacy and data integrity. In short, it's awesome!

Real Data in Development With Privacy Dynamics

We've demonstrated how to de-identify and clone production data and spin up ephemeral app development preview instances in a familiar Git-style version control environment. Using realistic data in your development process can help dev teams reduce bugs and ship high-quality software faster. To learn more about Privacy Dynamics and how we can help you unlock production data while reducing privacy risk, give us a shout, or sign up for a free trial at https://signup.privacydynamics.io.