Anonymization for GDPR Compliance: Can It Be Done?

Graham Thompson

01.19.24 · 8 min read

Anonymization for GDPR Compliance: Can It Be Done?

What is the GDPR standard for data anonymization?

Let's get this out of the way right up front. There is no General Data Protection Regulation (GDPR) standard for anonymization. No matter what you may have heard, "anonymisation" (that's Euro for "anonymization") appears nowhere in the massive 11-chapter, 99-article mandate. The term "anonymous" appears in the GDPR exactly three times in a single paragraph (Recital 26) in the context of exempting "personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable."

"This Regulation does not therefore concern the processing of such anonymous information," the GDPR flatly states.

That's it. No further definitions of anonymous personal data. No standard for achieving such anonymity in a dataset. Just the blanket statement that if data "does not relate to an identified or identifiable natural person," it's all fine and good by EU regulators.

Does this mean formal Data Anonymization techniques, as they are normally understood by privacy and security professionals, have no place in GDPR compliance efforts? Hardly. Most legal experts agree that the EU regulators' bare-bones, imprecise language leaves one relevant aspect of the data protection requirements quite clear: In order for information to be deemed anonymous—and therefore GDPR-exempt—it must be impossible to reverse the manipulation in such a way as to allow the identification of any individual in the dataset.

Easy right?

That fairly simple reading of the regs leaves defenders with a pretty clear mission regarding GDPR compliance and exemption. If a data anonymization technique can be undone, data handlers are on the hook for complying with the full letter of the regulation. If data custodians hope to treat information as exempt—which greatly reduces the cost and complexity of moving, storing, and using the data—the original data set must be rigorously anonymized such that identification of private individuals is and will always remain impossible.

Let's be clear, anonymization is not necessary for the protection of data under the GDPR. Anonymization is simply one of several ways to take data out of the scope of the regulation, which is the goal of many data handlers charged with managing, moving, and analyzing EU-centric data.

The General Data Protection Regulation (GDPR) was developed over a four-year period by the European Union (EU) with the primary goal of harmonizing disparate data privacy laws across Europe. Adopted in 2016 and enforced since May 2018, the GDPR replaced the 1995 Data Protection Directive, marking a significant shift in the data privacy landscape across Europe.

The GDPR applies to all companies processing the personal data of individuals in the 27-country EU, regardless of the company’s location. As such, the GDPR has a significantly broad global impact. The scope of the GDPR is extensive, touching upon aspects such as personal consent, breach notification, trans-border data transfer, and, as we've seen, some measure of data anonymization.

The ultimate goal of the GDPR was to give individuals more control over their personal data while holding companies accountable for protecting sensitive personal information, with hefty fines for non-compliance.

EU regulators may not have a clear standard for defining appropriate data anonymization, but they definitely have precedent for what failed anonymity looks like. In March of 2019, the Danish Data Protection Agency, Datatilsynet, fined taxi company Taxa 4X35 the equivalent of $180,000 for failing to properly anonymize personal data in the transaction records of some 8.8 million taxi rides.

The fine was levied on the taxi company because, while it removed passenger names and addresses from its data as part of its claimed data anonymization efforts, it retained customer phone numbers. The numbers made IDing the passengers trivial, the regulators found.

“We opted for a fine in this case. This is due to the fact that there are very large amounts of personal data which have been stored without an objective purpose. One of the basic principles in the field of data protection is that you only store the information you need—and when you do not need it anymore, it must be deleted immediately,” said the Danish DPA’s director Cristina Angela Gulisano while announcing the legal action.

This example clarifies what the legal experts conclude is the Union’s core definition of anonymization. From the GDPR perspective, effectively anonymized data must be:

Irreversible
Done in a manner that makes it impossible (or extremely impractical) to identify the individual by any method of analyzing the remaining data

The European Data Protection Board (EDPB), the successor of the G29 Working Party, is responsible for ensuring the consistent application of the GDPR. As such, their guidance on anonymization is a good starting point for thinking about anonymity in the context of the regulation. From the EDPB's perspective, good anonymization starts with a consideration of the following in any collection of data that includes personal info:

Individualization: Is it possible to identify a specific individual?
Correlation: Can separate datasets that include the same individual be linked?
Inference: Can information about an individual be inferred from the available data?

A negative answer to all of the above would make the data a priori anonymous and, therefore GDPR exempt. Fail any of the three tests, and a rigorous analysis of the information is required to determine the risks of individual identification.

Staying GDPR Compliant With Appropriate Anonymization

Taking what we know about the GDPR worldview of anonymized data and mapping that to the most popular techniques available, we can begin to craft a strategy for protecting private information in a way that meshes with the EU regulations. With the goal of regulatory exemption, the key lies in understanding the GDPR requirement for maintaining anonymity in the context of understanding how well each anonymization technique withstands efforts to reverse its particular brand of obfuscation.

Some of the common data anonymization methodologies and their suitability for GDPR include:

Data Pseudonymization

Pseudonymization involves replacing private identifiers with false identifiers (or pseudonyms), while still maintaining a specific identifier that allows access to the original data. While pseudonymized data can't identify an individual without additional information, it retains its usefulness for statistical analysis and testing.

While not acceptable to fully exempt data from GDPR protection, the EU regulators do go into some detail on pseudonymization. The regulations suggest the technique is a good option for safeguarding private data, but only if the pseudonym mapping, which can be used to reveal the real individuals in the data, is maintained separately from the obscured data itself.

That said, because it is ultimately reversible, pseudonymized data is still to be treated as protected data in the eyes of the GDPR.

Data Generalization

Data generalization protects sensitive information by reducing the precision of sensitive data. For instance, instead of having a precise address, data generalization might replace it with a more general location, such as the city or state. This method reduces the risk of identity exposure while preserving the data's overall integrity.

Data generalization offers a decent anonymization alternative, using generalized fields for variables like age or location that make tracing the information back to an individual more difficult. The resilience of generalized data to reverse identification is, however, only as strong as the methodology used to employ it and the protocols around access and storage of the original data.

Data Perturbation

Data perturbation modifies the original data set by applying specific algorithmic transformations such as rounding, swapping, or the addition of data "noise." This technique retains the statistical properties of the data, ensuring its usefulness for research or analysis while making it challenging to identify individuals.

Like generalization, the strength of data perturbation is only as robust as the algorithm used to implement it. Because it offers multiple approaches to manipulating the original data (rounding, swapping, noise, etc.), the technique ranks slightly higher on the GDPR suitability scale.

Data Masking

Data masking protects sensitive, personally identifiable information by obscuring specific data within a database, and altering certain data elements such as SSNs, credit card numbers, or addresses to maintain data privacy while ensuring the usefulness of the information for data analytics or testing. Done correctly, masking ensures the original data can't be re-engineered or re-identified, offering robust protection.

Proper data masking checks all of the requirement boxes for keeping GDPR-regulated data compliant or exempt. Masking, scrambling, or shuffling variables and data points ensures the data remains useful for data sharing or analysis without compromising individual privacy.

Data anonymization techniques may not be standardized in the GDPR, but they can still play a crucial role in protecting sensitive data while ensuring it remains useful for analysis, development, testing, and research. Security experts must be thoroughly conversant with these anonymization techniques, recognizing their importance in today's data-driven world. Data masking, data pseudonymization, data generalization, data perturbation, and data swapping represent key tools in the toolbox of any security expert dedicated to protecting data under the auspices of the stringent GDPR. The appropriate application of these methods can help to ensure the privacy and security of sensitive data while fulfilling all of the organization's ethical and legal obligations. It is vital, therefore, to stay abreast of the latest advancements in data anonymization. As technologies—and threats—continue to evolve, so too must the techniques for protecting sensitive, private data at home and abroad.

The Role of Privacy Dynamics

Privacy Dynamics is helping companies achieve the delicate balance between empowering developers and ensuring stringent data security. We’re designed to streamline and strengthen the process of protecting sensitive data in development environments.

One of the key offerings from Privacy Dynamics is our data anonymization solution. This enables companies to use realistic data in development and testing environments without exposing sensitive information. By replacing actual data with anonymized versions, developers can work with data that maintains the integrity of the original dataset while ensuring that personal information is kept secure. This approach is particularly beneficial for organizations that must comply with stringent data protection regulations like GDPR, as it helps maintain privacy without hindering development.

Privacy Dynamics also provides data masking capabilities essential for organizations handling sensitive customer or business information. Data masking ensures that while the structure of the data remains intact for development purposes, the actual content is obscured to prevent unauthorized access or exposure. This tool is handy in scenarios where developers need to work with data that resembles real-world datasets but does not require access to the actual sensitive data.

Additionally, our solutions are designed with ease of integration in mind. They can seamlessly integrate with existing data storage and management systems, reducing the burden on IT teams and minimizing disruption to existing workflows. This ease of integration is crucial for organizations looking to implement robust data security measures without compromising efficiency and productivity.

Anonymization for GDPR Compliance: Can It Be Done?

The GDPR in Brief

Anonymity and GDPR: What It Isn't

Data Pseudonymization

GDPR suitability:

GDPR COMPLIANCE RATING: Fair

Data Generalization

GDPR suitability:

GDPR COMPLIANCE RATING: Fair to Good

Data Perturbation

GDPR suitability:

GDPR COMPLIANCE RATING: Good

Data Masking

GDPR suitability:

GDPR COMPLIANCE RATING: Excellent

Conclusion: Robust Anonymization Can Work for GDPR Compliance

The Role of Privacy Dynamics