Back to Blog

We Failed a Client Security Questionnaire Because of Our Staging Database

Engineering 5 min read

We were halfway through a security questionnaire from a prospective enterprise client when we hit question 47:

"Do non-production environments contain real customer data? If yes, describe controls."

The honest answer was yes. Our staging database was a three-week-old copy of production. It had 38,000 customers' names, emails, phone numbers, and billing addresses. The "controls" were a VPN and a shared password in 1Password that 14 people had access to.

We wrote something vague about "data sanitisation procedures" and moved on. We got the deal. But it bothered me.

Six months later we went through SOC 2 Type II preparation. The auditor asked the same question, except this time they wanted evidence. Not a policy document, but actual evidence that non-production environments don't contain real PII.

We didn't have it. Because we'd never actually fixed the problem.

Why This Keeps Happening

Every engineering team I've talked to about this has the same story. At some point, someone copied production to staging because:

  • The seed script broke and nobody had time to fix it
  • A bug only reproduced with real data patterns
  • Someone needed realistic volume for load testing
  • It was faster than generating synthetic data

And then it just stayed. Because removing it meant building an anonymisation pipeline, and that was never the most urgent thing on the backlog.

Until an auditor asks. Or a security questionnaire asks. Or a contractor with staging access screenshots a bug report with a real customer's home address visible in the background.

What We Actually Did

We needed to keep the data structure, the relationships, the volume, and the edge cases, but strip all the personal information. Synthetic data wasn't going to cut it because our tests relied on real-world messiness.

We ended up running every table export through ComplyTech's API with the Pseudonymise strategy before importing into staging:

import requests

def pseudonymise_export(csv_content):
    response = requests.post(
        "https://api.comply-tech.co.uk/api/v1/anonymise",
        headers={"X-Api-Key": "your-api-key", "Content-Type": "application/json"},
        json={
            "content": csv_content,
            "contentType": "csv",
            "strategy": "Pseudonymise",
            "frameworks": ["GDPR"]
        }
    )
    return response.json()["anonymisedContent"]

The key feature was deterministic output. "Sarah Mitchell" becomes "Karen Taylor" everywhere, across every table. Foreign keys still work. Our integration tests still pass. But the data is entirely fictional.

We automated it as a weekly cron job. Every Monday morning, staging gets a fresh pseudonymised copy of production. Nobody has to remember to do anything.

The Uncomfortable Part

The thing that actually motivated us wasn't the SOC 2 audit. It was realising that a junior developer on their first day had full access to 38,000 customers' real personal data through staging. Before they'd completed security training. Before they'd signed anything beyond a standard employment contract.

That's the kind of thing that feels fine until you write it down.

Try It

If you want to see what the output looks like with your own data:

curl -X POST https://api.comply-tech.co.uk/api/v1/anonymise \
  -H "X-Api-Key: demo-key-complytech" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "id,name,email\n1,Sarah Jones,sarah@gmail.com\n2,Tom Wilson,tom@company.co.uk",
    "contentType": "csv",
    "strategy": "Pseudonymise",
    "frameworks": ["GDPR"]
  }'

The demo key is free, no signup. If the output looks right, you can have a working pipeline before your next audit.

Clean your staging database today

Try the demo key or get your own API key in minutes.