·TL;DR
- Anonymisation is irreversible. If no reasonable means can re-link the data to a person, it is anonymous and falls outside GDPR (Recital 26).
- Pseudonymisation is reversible. Identifiers are replaced, but a separate key can restore them. It is still personal data under Article 4(5), just lower risk.
- Tokenization replaces a value with a token and keeps the mapping in a vault. It is a common way to implement pseudonymisation, not a third category.
- Data masking hides values for display or test data. It may be reversible or not, depending on how it is done.
- For LLM pipelines, pseudonymisation is the realistic target: tokenize before the model, reveal real values only inside your trust boundary.
01What is anonymisation?
A note on spelling: you will see "anonymization" and "pseudonymization" (American) and "anonymisation" and "pseudonymisation" (British), and sometimes "pseudo anonymization." They mean the same thing. This article uses the British spelling.
Anonymisation is the irreversible removal of the link between data and a person. Done properly, no one can re-identify the individual using any means reasonably likely to be used, including extra data sets, current technology, and realistic cost and effort. Once data is genuinely anonymous, it is no longer personal data, and GDPR Recital 26 says the regulation does not apply to it.
The bar is high. Stripping a name is not enough if a zip code, a date of birth, and a diagnosis still single someone out. Real anonymisation usually means aggregating, generalising, or adding noise until the individual disappears into a group: "a patient in their 30s," not "Jane Doe, born 12 March 1991." That loss of detail is the point, and also the catch, because aggregated data is often too coarse to be useful for the task you had in mind.
02What is pseudonymisation?
Pseudonymisation replaces identifying values with substitutes, while keeping a separate key that can re-link them. GDPR Article 4(5) defines it as processing personal data so it "can no longer be attributed to a specific data subject without the use of additional information," where that additional information "is kept separately" under technical and organisational controls.
The key word is reversible. Because the original values can be recovered from the separately held key, pseudonymised data is still personal data and stays fully in scope of GDPR. It does not buy you an exemption. What it does buy you is lower risk, and the regulation rewards that: pseudonymisation is named in Article 4(5) and called out in Article 25 as a concrete measure for data protection by design. A leaked pseudonymised data set is far less damaging than a leaked plaintext one, because the attacker still needs the key.
03Where do tokenization and data masking fit?
Tokenization is a technique, not a separate legal category. You swap
a sensitive value for a token, a stand-in with no exploitable meaning, and store the
token-to-value mapping in a separate, access-controlled vault. Map "Jane Doe" to
[PERSON_1] and hold the pairing in a vault, and you have implemented
pseudonymisation: the link still exists, it just lives somewhere protected. Payment
systems have done this with card numbers for years.
Data masking obscures values, usually for display or for non-production
environments. Showing a card as **** **** **** 4242 is masking for display.
Generating fake-but-realistic records for a test database is masking too. Masking can be
irreversible (closer to anonymisation) or reversible (closer to pseudonymisation),
depending on whether the original can be recovered. The reversibility test is what
actually decides which GDPR bucket you land in, not the label you put on the tool.
04Comparison table
| Property | Anonymisation | Pseudonymisation | Tokenization | Data masking |
|---|---|---|---|---|
| Reversible? | No, by design | Yes, with the key | Yes, with the vault | Depends |
| Still personal data? | No | Yes | Yes | If reversible, yes |
| In scope of GDPR? | No (Recital 26) | Yes (Article 4(5)) | Yes | If reversible, yes |
| What it is | Legal end-state | Legal category | Technique | Technique |
| Typical use | Stats, open data sets | Risk reduction, processing | Payments, PII vaults, LLM prompts | Display, test data |
05Two concrete examples
Pseudonymisation / tokenization. Take "Jane Doe emailed from jane@acme.com about invoice 88213." Replace the identifiers and keep the mapping:
[PERSON_1]emailed from[EMAIL_1]about invoice[INVOICE_1]- Vault:
PERSON_1 = Jane Doe,EMAIL_1 = jane@acme.com,INVOICE_1 = 88213
The vault makes this reversible, so it is still personal data. But the text you can now share, log, or send to a model carries no real identifiers.
Anonymisation. Take the same record and aggregate it past the point of return: "one customer contacted support about a billing query this quarter." No name, no email, no invoice, no key anywhere that rebuilds the original. If no reasonable means can get you back to Jane Doe, that statement is anonymous and outside GDPR. Notice how much you threw away to get there.
06Why this matters for LLM pipelines
When you send a prompt to a hosted model, you are handing text to a processor you do not control. Anonymising it would protect you, but it usually guts the prompt: a model cannot draft a reply to "one customer" or reason about "a patient in their 30s." The detail the model needs is exactly the detail that identifies people.
Pseudonymisation threads that needle. Replace identifiers with stable tokens before the prompt leaves your infrastructure, let the model work on the tokens, then restore the real values from the vault inside your own trust boundary when the answer comes back. The model sees structure and context without the identifiers; you keep the ability to re-link. That is the realistic target for almost every production LLM use case, and it maps onto the same minimisation logic that GDPR and the EU AI Act push for.
Stable tokens matter here. If "Jane Doe" becomes the same [PERSON_1] every
time she appears, the model can still track that this person is the same person across a
conversation, which is what keeps the output coherent.
07Where Anonde fits
This is Anonde's view as an engineering tool, not legal advice. Anonde is an open-source, self-hosted PII boundary. It replaces personal data and secrets in text, JSON, PDFs, and logs with stable placeholder tokens before they reach an LLM, holds the mapping inside your infrastructure, and reveals the real values only within your own trust boundary. In the terms above, that is tokenization used to implement pseudonymisation at the model boundary. For the broader set of techniques, see our guide to data anonymization.
It is a technical control, one measure among several. It does not make you "compliant," and because pseudonymised data is still personal data, it does not move your prompts outside GDPR. What it does is keep identifiers out of the riskiest place, someone else's model, while preserving the structure your application depends on.
See how it works, try the live demo, or read the quickstart to run it on your own infrastructure.
08FAQ
What is the difference between anonymisation and pseudonymisation?
Anonymisation is irreversible: no reasonable means can re-link the data to a person, so it falls outside GDPR under Recital 26. Pseudonymisation is reversible: identifiers are replaced but a separate key can restore them, so it stays personal data under Article 4(5).
Is pseudonymised data still personal data?
Yes. Because it can be re-linked to a person using additional information, it remains in scope of GDPR. The regulation still encourages it as a data-protection measure under Article 4(5) and Article 25.
Is tokenization the same as pseudonymisation?
Tokenization is one way to do pseudonymisation. It swaps a value for a token and keeps the mapping in a separate vault. As long as that vault can re-link the token to the person, the result is pseudonymised, not anonymised.
Which one should I use before sending data to an LLM?
Pseudonymisation. Tokenize identifiers before the prompt leaves your infrastructure, then reveal the real values only inside your trust boundary. Full anonymisation usually destroys the detail the model needs.
What is pseudo anonymization?
"Pseudo anonymization" is an informal name for pseudonymisation. It replaces identifiers with substitutes while keeping a separate key or vault that can re-link them, so the data is reversible and stays personal data under GDPR. True anonymisation is irreversible and falls outside GDPR under Recital 26.
·Sources
- GDPR Article 4(5): Definition of pseudonymisation
- GDPR Article 25: Data protection by design and by default
- GDPR Recital 26: Not applicable to anonymous data
This article is general information, not legal advice. Confirm your obligations with qualified counsel.