·TL;DR
- Anything in a prompt that identifies a person or grants access is PII or a secret: names, emails, phone numbers, account IDs, API keys.
- Redaction only reduces exposure if it happens before the API call. After the bytes leave your network, the raw text may already sit in provider logs.
- The working pattern is redact-then-reveal: replace each span with a stable token like
[EMAIL_1], keep a private mapping, restore on the way back. - Detection is a hybrid: regex for structured identifiers, NER for contextual entities like names and addresses. Neither alone is enough.
- The mapping from token to real value is itself sensitive. Keep it inside your trust boundary, never in the prompt.
01What counts as PII in a prompt?
PII is any data that identifies a person, directly or indirectly (see what counts as PII for the full breakdown). In a prompt that covers more than you might expect. Direct identifiers are the obvious ones: full name, email, phone number, home address, government ID, payment card. Indirect identifiers narrow a person down in combination: a job title plus an employer plus a city, a rare medical condition, an internal customer number that maps to a record.
Prompts also carry a second class of data that is not strictly personal but is just as dangerous to leak: secrets. API keys, bearer tokens, connection strings, and internal hostnames routinely get pasted into prompts alongside the text someone wants summarised. Treat them with the same boundary as PII. They identify your infrastructure the way an email identifies a person.
The practical test: if this value showed up in a third party's log file, would you have to report it, rotate it, or apologise for it? If yes, it does not belong in a raw prompt.
02Why redaction has to happen before the API call
Redacting after the model responds is theatre. By then the data has already crossed your network boundary. The only point where redaction reduces real exposure is before the request leaves your infrastructure.
Three concrete failure modes make this non-negotiable:
- Provider logs. Hosted APIs retain request content to monitor for abuse. OpenAI, for example, keeps abuse-monitoring logs for up to 30 days by default, and those logs may contain prompts and responses. Longer retention is allowed where law or safety requires it. Zero-retention is an opt-in for approved accounts, not the default.
- Fine-tuning and training. If you opt in to share data, or if you self-host and pipe prompts into a training set, identifiable text becomes part of model weights. Pulling one person back out of trained weights is close to impossible.
- Cross-tenant leakage. Bugs, cache poisoning, and prompt-injection attacks have all surfaced one user's data inside another user's session. Data the model never received cannot leak this way.
Redacting up front turns all three from a data-loss event into a non-event. The broader legal angle is in our note on GDPR and LLMs.
03The redact-then-reveal pattern
This is the pattern that actually ships. Four steps:
- Detect the sensitive spans in the input.
- Replace each span with a stable placeholder token, and record the mapping from token to original value.
- Send only the tokenised text to the model.
- Reveal: when the response comes back, swap the tokens for the real values using the mapping, inside your trust boundary.
The model reasons over [PERSON_1] exactly as it would over a real name.
It can copy the token into its answer, refer back to it, format it. As long as the
token round-trips, the user sees a coherent reply with the real values restored, and
the provider never saw them. Stability is the key property: the same value maps to
the same token within a request, so the model can keep two mentions of the same
person straight.
04Detection: regex vs NER, and where each wins
You cannot redact what you cannot find, so detection is the whole game. Two approaches, and you want both.
Regex and pattern matching wins for structured identifiers with a known shape: emails, phone numbers, credit cards, IBANs, IP addresses, API keys. It is fast, deterministic, cheap, and exact when the pattern is well defined. It also produces false positives on lookalikes (an order ID shaped like a phone number) and, critically, it cannot find anything without a fixed format.
Named-entity recognition (NER) wins for contextual PII that has no regular shape: person names, street addresses, organisations, locations. A model classifies tokens by meaning, so it catches "my sister Sarah who works at the Mayo Clinic," which regex sees as ordinary words. The cost is latency and the occasional miss or mislabel on names it has not seen.
The right design composes them. Run regex for the structured stuff, NER for the contextual stuff, and let the two cover each other's blind spots. Reported hybrid pipelines land around the low-90s in F1 on document PII, which beats either method alone.
05A worked example: before and after
Here is a support-summarisation prompt as it usually arrives, full of PII and one secret:
Summarise this ticket for the on-call engineer.
From: Maria Gonzalez <maria.gonzalez@northwind.example>
Phone: +1 415 555 0142
Account: ACME-558217
"My production key sk-live-9fK2cQ7pLxR4 stopped working after
the migration. Can someone on the Northwind team call me back?"
After redaction, every sensitive span is a stable token. The mapping is held privately and never travels with the request:
Summarise this ticket for the on-call engineer.
From: [PERSON_1] <[EMAIL_1]>
Phone: [PHONE_1]
Account: [ACCOUNT_1]
"My production key [SECRET_1] stopped working after
the migration. Can someone on the [ORG_1] team call me back?"
The private mapping, kept inside your boundary:
[PERSON_1] -> Maria Gonzalez
[EMAIL_1] -> maria.gonzalez@northwind.example
[PHONE_1] -> +1 415 555 0142
[ACCOUNT_1] -> ACME-558217
[SECRET_1] -> sk-live-9fK2cQ7pLxR4
[ORG_1] -> Northwind
The model returns something like "[PERSON_1] reports their API key [SECRET_1] failed after the migration and wants a callback from [ORG_1]." You run the reveal step and the on-call engineer reads the real names and the real key. The provider logged only tokens.
06Pitfalls that quietly leak data
- Partial matches. Redacting "Maria Gonzalez" but missing "Ms. Gonzalez" two lines down leaves a surname in the clear. Detection has to be consistent across every mention, including possessives and abbreviations.
-
Tokens that leak context. A placeholder like
[EMAIL_maria_gonzalez_1]defeats the point. Keep tokens opaque: a type and an index, nothing more. - Format-carried PII. Replacing the digits of a card number but leaving its exact length and grouping can still narrow things down. Normalise the placeholder so it does not echo the original shape.
- Mapping storage. The token-to-value mapping is the crown jewels. If it ends up in the prompt, in a shared cache, or in the same log you were trying to protect, you have moved the leak, not closed it. Keep it in memory or in storage you control, scoped to the request, and discard it when you no longer need to reveal.
- Free-text overflow. Users paste PII into fields you did not expect: a "notes" blob, an attached PDF, a stack trace with a hostname. Run the boundary on the whole payload, not just the fields you think are risky.
07How Anonde implements this boundary
Anonde is an open-source, self-hosted PII boundary that does exactly this pattern. It replaces personal data and secrets in text, JSON, PDFs, and logs with stable placeholder tokens before the content reaches a model, then reveals the real values only inside your own trust boundary.
Detection is hybrid: pattern matching for structured identifiers and secrets, plus entity recognition for contextual PII like names and addresses. Tokens are opaque and stable within a request, so the model stays coherent. The reveal step runs on your infrastructure, which means the mapping from token to real value never leaves your control and never sits at the provider. It is self-hosted and Apache 2.0, built in Go, so the boundary runs where your data already lives.
To be clear about scope: Anonde is a technical control, not legal compliance. It reduces how much identifiable data reaches a model and gives you one auditable place where redaction happens. It does not, on its own, make you compliant with any regulation; for the regulatory framing see GDPR and LLMs and the EU AI Act. Redacting before the model is one form of data anonymization.
See how it works, try the live demo with your own text, or read the quickstart to run it on your own infrastructure.
08FAQ
Why redact before the LLM call instead of after?
Because once a prompt reaches a provider, the raw text can land in abuse-monitoring logs, get reviewed, or end up in a fine-tuning set if you opted in. Redacting after the call does nothing for data that already left your network.
What is the redact-then-reveal pattern?
Detect sensitive spans, replace each with a stable token like [EMAIL_1],
keep a private mapping, send only tokens to the model, then swap the tokens back for
real values inside your trust boundary.
Regex or NER for detection?
Both. Regex is fast and exact for structured identifiers with known shapes. NER catches contextual PII like names and addresses that has no fixed format. A hybrid pipeline covers both classes.
Is the redaction reversible?
Only if you keep the token-to-value mapping, which is sensitive and must stay inside your boundary. Discard the mapping for one-way redaction; keep it for pseudonymisation with controlled re-identification.
·Sources
- OpenAI: Data controls in the OpenAI platform (abuse-monitoring retention, training defaults)
- Scientific Reports: A hybrid rule-based NLP and machine learning approach for PII detection and anonymization
- Protecto: PII detection in unstructured text, why regex fails and what works
- Tonic.ai: Named entity recognition for data compliance automation
This article is general engineering guidance, not legal advice. Anonde is a technical control, not a compliance certification.