Blog
Practical guides for teams handling personal data in AI pipelines, vendor exports, and test environments.
How to Strip PII Before Sending Data to ChatGPT
We were three weeks into building an internal support tool when someone asked: "Wait, are we sending customer emails to OpenAI?" We were. Every single ticket. Here's how we fixed it with one API call.
Anonymise CSV Data for GDPR Compliance (Without Losing Your Mind)
Last month a colleague in finance asked: "I need to send this spreadsheet to the new analytics vendor. It's got customer names and emails in it. Do I need to do anything with that?" Yes. Yes you do.
Your Staging Database Is Full of Real Customer Data (And Everyone Knows It)
Nobody talks about this, but almost every engineering team has done it: copied the production database into staging so the test environment has "realistic" data. Here's how to keep the realism without the risk.
We Failed a Client Security Questionnaire Because of Our Staging Database
We hit question 47 of an enterprise security questionnaire: "Do non-production environments contain real customer data?" The honest answer was yes, 38,000 customers. Here's what we did before the SOC 2 audit.
The EU AI Act Starts Enforcement in August 2026. Here's What That Means for Your LLM Pipeline.
Most LLM applications won't be classified as high-risk AI. But GDPR already imposes data minimisation and DPA obligations on AI pipelines, and the Act's enforcement deadline is making compliance teams look harder at data flows they've been overlooking.
We Accidentally Logged 6,000 Email Addresses to Datadog
A debug logging statement stayed in production for three months. 6,000 customers' email addresses ended up in a third-party logging platform. Here's how we stopped it happening again.
Our Analytics Vendor Asked for Customer Data. We Almost Sent It Unredacted.
Our marketing ops lead pulled a 47,000-row customer export and was about to attach it to a vendor reply. Here's why deleting the name column isn't enough, and what we do instead.
How to Build an Internal AI Tool Without Your Compliance Team Blocking It
The prototype gets blocked. The team is frustrated. Compliance isn't wrong. The fix isn't legal; it's architectural. Strip PII before the LLM and the compliance conversation goes very differently.
Stop Putting Real Customer Emails in Your CI Pipeline
500 real customers committed as test fixtures. Their names, emails, and phone numbers now in Git history, CI logs, and artefact storage. This isn't a theoretical concern; it's a data breach.
Your RAG Pipeline Is Leaking Customer Data Into Vector Embeddings
When you embed a document chunk containing a customer's name and address, that personal data lives in your vector store. Here's the GDPR problem hiding in your RAG system and how to fix it before ingestion.
I Reviewed 50 Companies' AI Privacy Policies. Most of Them Aren't Telling the Whole Story.
43 used a third-party LLM API. Only 11 mentioned it in their privacy policy. 3 described any PII sanitisation. The rest rely on vague assurances and enterprise terms that don't say what they imply.
The GDPR Fine Calculator: What a Spreadsheet Leak Actually Costs
The ICO fines companies you've never heard of. A realistic cost breakdown for a mid-market spreadsheet incident: direct fine, legal costs, breach notifications, customer churn. The total might surprise you.
A Checklist for AI Features That Won't Get Blocked by Legal
After watching three AI feature projects get killed by compliance reviews, I started keeping notes on what ships and what stalls. The difference is always the same: data handling addressed before someone asked.