Anonymise CSV Data for GDPR Compliance

Last month a colleague in our finance team sent me a Slack message that went something like:

"Hey, I need to send this spreadsheet to the new analytics vendor. It's got customer names and emails in it. Do I need to do anything with that?"

Yes. Yes you do.

This happens constantly in every company that handles customer data. Someone needs to share a report, a data export, a customer list, and the file is full of personally identifiable information that shouldn't leave the building.

The usual process goes like this:

Export the CSV
Realise it has PII in it
Spend 45 minutes manually deleting columns or find-and-replacing names
Miss a few entries
Send it anyway because you're late for a meeting
Hope nobody notices

There's a better way. And it doesn't involve buying a six-figure platform or setting up a Python environment on a finance team laptop.

Why This Keeps Happening

Most companies have a GDPR policy document somewhere. It says sensible things about data minimisation, purpose limitation, and third-party data sharing. It lives in a SharePoint folder that nobody has opened since the company's GDPR panic of 2018.

The problem isn't policy. The problem is that there's no easy way for non-technical people to actually follow the policy.

A typical customer export looks like this:

CustomerId,Name,Email,Phone,TotalSpend,LastOrder,Address
1001,James O'Brien,james.obrien@gmail.com,07700 900456,£2340.50,2024-11-15,"42 Victoria Road, Leeds, LS1 5AR"
1002,Priya Sharma,priya.sharma@outlook.com,07700 900789,£890.00,2024-12-01,"7 Elm Street, Birmingham, B1 1AA"
1003,Michael Chen,m.chen@yahoo.co.uk,07700 900321,£5100.75,2024-10-28,"19 Kings Court, London, SW1A 2AA"

The analytics vendor needs TotalSpend, LastOrder, and maybe CustomerId. They absolutely do not need names, emails, phone numbers, or home addresses. But the export includes everything because that's how the database query works, and nobody's going to rewrite the SQL.

What GDPR Actually Requires Here

Article 5(1)(c): data minimisation: personal data must be "adequate, relevant and limited to what is necessary."

If you're sharing data for analytics purposes, the customer's name is not necessary. Their email is not necessary. Their address is definitely not necessary.

Article 25: data protection by design and by default: you should have technical measures in place to ensure data minimisation. "Sarah in finance manually deletes column B" is not a technical measure.

You don't need to make this complicated. You just need to strip the PII before the file leaves your hands.

The Manual Approach (And Why It Fails)

"Just delete the columns with PII."

Simple enough for a 50-row spreadsheet. Now try it with:

10,000 rows where names appear in a "Notes" column alongside order details
A CSV where the email address is embedded in a URL field
Multiple files per week because the vendor needs fresh data every Monday
A column called "Details" that sometimes contains phone numbers and sometimes doesn't

Manual anonymisation doesn't scale. It's also error-prone. All it takes is one missed row in a 5,000-row spreadsheet and you've shared personal data you shouldn't have.

The Developer Approach (And Why It's Overkill)

"Write a Python script."

Great if you're a developer. Most of the people who actually send CSVs to third parties aren't. They're in finance, marketing, operations, or customer success. Asking them to install Python, pip install pandas, and run a script is not realistic.

Even if you do write the script, you're now maintaining a custom PII detection tool. Every new PII type (someone adds a national insurance number column) means updating the script. Every edge case (names with apostrophes, international phone formats) means debugging regex at 4pm on a Friday.

What We Use Instead

ComplyTech handles CSVs natively. You send the file content, tell it the content type is CSV, and it returns the same structure with PII replaced.

curl -X POST https://api.comply-tech.co.uk/api/v1/anonymise \
  -H "X-Api-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "CustomerId,Name,Email,Phone,TotalSpend\n1001,James O'\''Brien,james@gmail.com,07700 900456,£2340.50\n1002,Priya Sharma,priya@outlook.com,07700 900789,£890.00",
    "contentType": "csv",
    "strategy": "Redact",
    "frameworks": ["GDPR"]
  }'

What comes back:

CustomerId,Name,Email,Phone,TotalSpend
1001,[NAME REDACTED],[EMAIL REDACTED],[PHONE REDACTED],£2340.50
1002,[NAME REDACTED],[EMAIL REDACTED],[PHONE REDACTED],£890.00

The CustomerId and TotalSpend columns (the bits the vendor actually needs) are untouched. Everything personal is gone.

Three Ways to Handle It

Redact: replaces PII with labels like [NAME REDACTED]. Best when the recipient doesn't need any version of the personal data.

Mask: partially hides values: james.obrien@gmail.com becomes j***@gmail.com. Useful when someone needs to recognise the record but shouldn't see the full value.

Pseudonymise: replaces with realistic fake data: James O'Brien becomes David Thompson, with a consistent mapping so the same input always produces the same output. Ideal when the vendor needs data that looks real for testing or analysis but doesn't contain actual customer information.

How Billing Works for CSVs

ComplyTech bills by "fields" not by API calls. For CSVs, a field is one cell. So a 1,000-row CSV with 7 columns = 7,000 fields.

On the Starter plan (£29/mo), you get 25,000 fields per month. That's roughly 3,500 rows of a 7-column CSV every month, more than enough for most regular reporting workflows.

If you're sending one big export per month, that maths works. If you're processing massive datasets daily, you'd want the Pro plan or to talk about enterprise pricing.

Making It Practical for Non-Technical Teams

The API is designed for developers to integrate into workflows, but you can make it accessible to non-technical teams with minimal effort:

Option 1: A simple internal web form. A single HTML page with a file upload that calls the API and downloads the cleaned CSV. Half a day of development work.

Option 2: A Google Sheets / Excel add-in. Call the API from a spreadsheet macro. The finance team clicks a button, the sheet gets anonymised.

Option 3: Automate it in your export pipeline. If the CSV comes from a scheduled database export, add the API call to the pipeline. The anonymised version is what lands in the shared folder. Nobody has to remember to do anything.

The right answer depends on your team, but the point is: once the API exists, making it accessible is straightforward.

What It Won't Do (And That's Fine)

This isn't a magic compliance tool. It strips PII from structured data. It doesn't:

Write your GDPR policy for you
Replace your Data Protection Officer
Handle consent management
Manage data subject access requests

It solves one specific problem: making sure the CSV you share doesn't contain personal data it shouldn't. That's a narrow scope, but it's the problem that actually causes breaches.

Most GDPR fines aren't from sophisticated attacks. They're from someone emailing a spreadsheet to the wrong vendor with customer data still in it. Solving that one problem eliminates a disproportionate amount of risk.

Try It With Your Own Data

The demo key lets you test with real CSVs:

curl -X POST https://api.comply-tech.co.uk/api/v1/anonymise \
  -H "X-Api-Key: demo-key-complytech" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Name,Email,Department\nSarah Jones,sarah@example.com,Marketing\nTom Wilson,tom.w@company.co.uk,Finance",
    "contentType": "csv",
    "strategy": "Redact",
    "frameworks": ["GDPR"]
  }'

The demo key gives you 100 fields per month, enough to test with a real export and see if the detection catches what you need.

If it does, you've just solved a problem that your compliance team has been quietly worrying about since 2018.

ComplyTech was built because we got tired of watching spreadsheets full of customer data get emailed to vendors with a cheerful "here's the data you asked for!" It's a small problem that causes big fines. We figured it deserved a simple solution.

Anonymise CSV Data for GDPR Compliance (Without Losing Your Mind)