After watching three different AI feature projects at three different companies get blocked, delayed, or killed by compliance reviews, I started keeping notes on what actually gets through and what doesn't.
The projects that ship have one thing in common: they addressed data handling before someone asked about it. The projects that stall are the ones where engineering builds first and deals with compliance later.
Here's the checklist we now use before starting any AI feature that touches customer data. It's not comprehensive legal advice; it's a practical engineering checklist that prevents the most common compliance blockers.
Before You Write a Line of Code
1. Map the data flow. Draw a diagram showing where customer data goes. If it crosses a boundary (your infrastructure to a third-party API), flag it. Every boundary crossing needs justification.
2. Identify the PII. What personal data does this feature process? Names, emails, phone numbers, addresses, financial data, health data? Write the list down. You'll need it for your DPIA (Data Protection Impact Assessment), if one is required under Article 35.
3. Ask: does the AI actually need the PII? In most cases, no. A support ticket summariser doesn't need the customer's name. A sentiment analyser doesn't need their email. If the PII isn't necessary for the AI to function, strip it.
4. Choose your sanitisation strategy. If stripping PII, decide how:
- Redact for most LLM use cases (simple, clear, no confusion)
- Pseudonymise if you need to re-identify after processing
- Mask if partial visibility is useful
During Development
5. Add PII sanitisation to the pipeline. One API call before the LLM call:
clean_data = requests.post(
"https://api.comply-tech.co.uk/api/v1/anonymise",
headers={"X-Api-Key": "your-api-key", "Content-Type": "application/json"},
json={"content": raw_data, "contentType": "text", "strategy": "Redact", "frameworks": ["GDPR"]}
).json()["anonymisedContent"]
6. Log the sanitised prompts, not the raw ones. Your application logs should never contain the pre-sanitised customer data. Log what you sent to the LLM (clean), not what the customer sent you (raw).
7. Test with the actual sanitisation step. Don't skip PII stripping in dev/staging "because it's just test data." Your tests should exercise the full pipeline including sanitisation. This catches issues where redaction breaks the LLM's ability to process the input.
Before Launch
8. Prepare the compliance brief. Document: what data is processed, what PII is stripped, what reaches the LLM provider, what the LLM provider's data retention policy is, and what audit trail exists. Give this to your compliance/legal team before they ask for it.
9. Update the privacy policy. Be specific: "Customer data is processed by [LLM provider] for [purpose]. Personal identifiers are removed before data reaches the AI model."
10. Set up monitoring. Track: how many fields are being sanitised, what PII types are being caught, whether any PII types are slipping through (monitor LLM responses for PII patterns that shouldn't be there).
The Meta-Point
This checklist isn't about making compliance teams happy. It's about building AI features that are defensible, ones where you can explain exactly what happens to customer data at every step, and you can prove it.
The companies shipping AI features fastest aren't the ones ignoring compliance. They're the ones who made compliance trivial by designing it into the architecture from day one.
Try the Sanitisation Step
curl -X POST https://api.comply-tech.co.uk/api/v1/anonymise \
-H "X-Api-Key: demo-key-complytech" \
-H "Content-Type: application/json" \
-d '{
"content": "{\"customer\": \"Sarah Mitchell\", \"email\": \"sarah@gmail.com\", \"issue\": \"Order #4829 not delivered to 14 Beechwood Ave\", \"sentiment\": \"frustrated\"}",
"contentType": "json",
"strategy": "Redact",
"frameworks": ["GDPR"]
}'
Design compliance in from day one
One API call adds PII sanitisation to your entire AI pipeline.