Does the right to erasure apply to LLM training data?

If personal data was used to train or fine-tune a model, a data subject can still ask for erasure. Removing a specific person from a trained model is hard, which is why the safer pattern is to keep identifiable data out of training and inference in the first place.

What is the difference between anonymisation and pseudonymisation under GDPR?

Truly anonymised data cannot be linked back to a person and falls outside GDPR (Recital 26). Pseudonymised data, where identifiers are replaced but a key can still re-link them, is still personal data, but it lowers risk and is explicitly encouraged as a data-minimisation measure.

How big are GDPR fines?

Breaching the core processing principles, data subject rights, or international transfer rules can cost up to 20 million euros or 4% of global annual turnover, whichever is higher. Lesser violations top out at 10 million euros or 2%.

GDPR and LLMs: keep personal data in bounds

Q: Is sending personal data to an LLM a GDPR concern?

Yes. A prompt that contains a name, email, account number, or any other identifier is processing of personal data under GDPR. When you call a third-party model, that provider usually acts as a processor or sub-processor, which means you need a lawful basis, a data processing agreement, and controls over where the data goes.

·TL;DR

A prompt containing an identifier is processing of personal data. GDPR applies, model or no model.
The principles that bite hardest for LLMs are data minimisation, purpose limitation, and storage limitation (Article 5).
Call a third-party model and it is usually your processor or sub-processor: you need a lawful basis, a data processing agreement, and control over transfers.
Right to erasure is awkward once personal data is in a trained model, so the safe move is to keep it out from the start.
Stripping identifiers before the model is a technical control that supports minimisation and pseudonymisation. It is one measure, not legal cover.

01GDPR in one paragraph

The General Data Protection Regulation governs how organisations handle personal data about people in the EU and EEA. Personal data is anything that can identify a person, directly or indirectly: a name, an email, an IP address, a customer ID. If you decide why and how that data is processed, you are a controller. If you process it on someone else's instructions, you are a processor. GDPR reaches you wherever you are based, as long as the people are in the EU.

02The principles that bite when you use LLMs

Article 5 sets seven principles. Three of them collide directly with how teams tend to wire up LLMs:

Data minimisation: personal data must be "adequate, relevant and limited to what is necessary." Dumping a full customer record into a prompt because it was convenient fails this test.
Purpose limitation: data collected for one purpose cannot be quietly reused for another. Feeding support tickets into model training is a new purpose that needs its own basis.
Storage limitation: you keep data only as long as needed. Prompts and completions logged forever by a provider work against this.

The other four (lawfulness and transparency, accuracy, integrity and confidentiality, and accountability) still apply. Accountability means you have to be able to show you meet the rest, not just assert it.

03Who is the controller, who is the processor?

When your app sends a prompt to a hosted model, the model provider is normally acting as your processor, and any infrastructure they rely on becomes a sub-processor. That chain matters. You need a data processing agreement under Article 28, a lawful basis for the processing, and a clear picture of who can see the data and where it lives.

It does not move off you because someone else runs the model. The controller carries the duty to the data subject regardless of how many vendors sit downstream.

04The hard problems LLMs create

Prompts leak. Anything in a prompt can land in provider logs, a fine-tuning corpus, or another tenant's context window. That is personal data leaving your control.
Erasure is hard. A data subject can ask to be deleted. Pulling one person out of a trained model is close to impossible, so the data should never have entered training.
Transfers add rules. Sending personal data to a model hosted outside the EEA triggers the international transfer regime (Articles 44 to 49), with its own safeguards.
Sub-processors multiply. Each hop in the model supply chain is another party you have to account for and disclose.

05Anonymisation vs pseudonymisation

This distinction is the lever. Under Recital 26, truly anonymised data cannot be tied back to a person and sits outside GDPR entirely. If the model only ever sees data that cannot identify anyone, much of the burden falls away.

Pseudonymised data is different. You replace identifiers with tokens, but you keep a key that can re-link them. It is still personal data, so GDPR still applies, yet it measurably lowers risk and is named in the regulation as a data-minimisation technique. For most LLM use cases, pseudonymisation before the model plus controlled re-identification afterwards is the realistic target.

06Where a PII boundary fits

This is Anonde's view, not legal advice: the cleanest way to shrink GDPR exposure is to keep identifiable data out of the model unless it is genuinely needed.

Anonde is an open-source, self-hosted boundary that replaces personal data and secrets in text, JSON, PDFs, and logs with stable placeholder tokens before they reach an LLM, then restores the real values only inside your own trust boundary. Nothing identifiable is persisted in a provider's systems by default.

As an engineering measure, that lines up with what GDPR pushes for: data minimisation, pseudonymisation, and a clear point in the pipeline where redaction happens and can be audited. It is one control among several. It does not make you "compliant," replace a data processing agreement, or stand in for legal advice. It just means there is less personal data in the riskiest place: someone else's model. The same logic carries over to the EU AI Act, which leans on the same minimisation and governance ideas.

See how it works, try the live demo, or read the quickstart to run it on your own infrastructure.

07FAQ

Is sending personal data to an LLM a GDPR concern?

Yes. A prompt with a name, email, or account number is processing of personal data. The model provider is usually your processor or sub-processor, so you need a lawful basis, a data processing agreement, and control over where the data goes.

Does the right to erasure apply to training data?

If personal data trained or fine-tuned a model, a person can still ask to be erased. That is hard to honour after the fact, which is why keeping identifiable data out of training is the safer pattern.

Anonymisation or pseudonymisation, what is the difference?

Anonymised data cannot be linked back to anyone and falls outside GDPR. Pseudonymised data swaps identifiers for tokens but keeps a re-linking key, so it stays personal data while lowering risk.

How big are the fines?

Up to 20 million euros or 4% of global annual turnover for breaching core principles, data subject rights, or transfer rules. Lesser violations cap at 10 million euros or 2%.

·Sources

This article is general information, not legal advice. Confirm your obligations with qualified counsel.