Projects and Datasets

Understanding Projects and Datasets

Introduction to Projects

A Project is a collection of resources associated with your data. It is a logical grouping of data that exists in a source system that can be operated on together. Specifically, a Project contains:

  • Connections: Projects have either a single connection or a connection each for the Source and Destination.
  • Datasets: Projects contain many Datasets that all use the same Connections. More on Datasets below.
  • Runs: Projects maintain a history of previous treatments.
  • Schedules: Projects optionally have a schedule of future Runs.

After a Project is run, we generate a Report that summarizes important information from that Run, including the risk profile of all of the Datasets.

Creating a Project

To create a new Project, you can follow the Quickstart guide, or select the +PROJECT or Anonymize buttons in the UI.

Creating a Schedule for a Project

In the final step of setting up a Project, you can decide whether or not to add a Schedule to your Project. You can choose between simple hourly, daily, or weekly schedules; a custom schedule built in the UI; or a schedule defined by a crontab.

Deleting a Project

Projects can be deleted from the individual Project page in three steps:

  1. Select Data in the main navigation.
  2. Select the name of the project from the list in the main pane.
  3. In the top-right corner, select ..., then Delete.

Introduction to Datasets

A Dataset is a reference to a table in your Source system, along with the treatment plan and configuration you have specified. After a Project is Run, a Dataset also includes a reference to the Destination table, and a detailed report of the treatment applied by the anonymizer, including Risk and Distortion metrics.

Configuring a Dataset

When creating a Project and configuring a Dataset for the first time, Privacy Dynamics will automatically classify the direct and quasi-identifiers in your data, and treat the data accordingly. If you need to fine-tune this behavior, you can select fields to Lock or Redact:

  • Lock instructs the anonymizer not to treat this field. It will be passed through as-is to the Destination table.
  • Redact instructs the anonymizer to completely suppress this field by dropping the column in the Destination table.
  • Mask instructs the anonymizer to suppress this field by replacing the values of this field with a constant mask expression.
  • Realistic instructs the anonymizer to replace the values of this field with random but format-preserving surrogates.

If new fields are added to the Source data, they will automatically be treated and added to the Destination on the next Run of the Dataset.

For more detailed information on configuring Datasets, see the next section.

Editing an Existing Dataset

You can update the configuration of a Dataset after it has been created. From the Dataset page, select EDIT SETTINGS in the top-right corner. In the modal that appears, choose fields to Lock and Redact, and then select either SAVE or SAVE & RUN to treat the data with the new configuration immediately.

Understanding the Dataset Report

After a Run has completed, the Dataset page will display a report with useful information about the Dataset and its anonymization treatment, including information about:

  • Risk: The results of our Risk Assessment. The Risk score can be viewed at the top of the report, and a more detailed risk profile can be viewed by selecting the Risk tab.
  • Distortion: Distortion metrics measure the amount of change in the Dataset that was introduced by the anonymization treatment. Generally, Datasets with less distortion have higher utility and can be used for a wider range of applications.

We store the Report after each Run of your Dataset. If you would like to compare how the metrics for your Dataset have changed over time, you can view an earlier Report by selecting the date of a previous Run in the Report drop-down.

Viewing the Run History

You can view all of the Runs for all of your Projects by selecting Data on the main navigation, and then selecting the Runs tab.

Previous
Editing a Connection
Next
Risk Assessments