Last week I was reviewing a pull request and noticed something in the CI logs. Our integration test suite was making API calls with real customer data. Not test fixtures, but actual production customer records.
How did this happen? Simple. The integration tests needed realistic data. Someone pulled 500 rows from the production database, committed them as a test fixture file, and pushed to the repo.
Those 500 real customers' names, emails, and phone numbers were now:
- In our Git history (forever)
- In CI logs for every test run (visible to everyone on the team)
- In artefact storage (retained for 90 days)
- In any branch that was forked after the commit
This isn't a theoretical concern. It's a data breach. 500 people's personal data was exposed to a system (CI infrastructure) that has no business holding it.
The Quick Fix That Creates the Problem
The usual progression:
- Team writes integration tests that need realistic data
- Someone creates test fixtures from production data
- The fixtures get committed to the repo
- CI runs the tests and logs outputs containing real PII
- Nobody thinks about this until an audit or incident
The "fix" people try first: replace the test fixture data manually. But this needs to happen every time someone refreshes the fixtures. Which is exactly the kind of manual process that humans forget to do.
The Actual Fix
Generate your test fixtures by running real data through ComplyTech's pseudonymisation, then commit the pseudonymised output:
# Pull fresh data from production
pg_dump --table=customers --data-only prod_db > customers.csv
# Pseudonymise it
curl -X POST https://api.comply-tech.co.uk/api/v1/anonymise \
-H "X-Api-Key: your-api-key" \
-H "Content-Type: application/json" \
-d "{
\"content\": \"$(cat customers.csv)\",
\"contentType\": \"csv\",
\"strategy\": \"Pseudonymise\",
\"frameworks\": [\"GDPR\"]
}" | jq -r '.anonymisedContent' > test/fixtures/customers.csv
# Commit the clean version
git add test/fixtures/customers.csv
git commit -m "Refresh test fixtures (pseudonymised)"
The test fixtures look real. They have the edge cases and data patterns your tests need. But they don't contain actual customer data.
Because the pseudonymisation is deterministic, you can refresh fixtures regularly and your tests won't break; the same production customer always maps to the same fake identity.
Add It to Your CI Pipeline
Better yet, automate the fixture refresh as a CI step that runs weekly:
# .github/workflows/refresh-fixtures.yml
name: Refresh Test Fixtures
on:
schedule:
- cron: '0 6 * * 1' # Every Monday at 6am
jobs:
refresh:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Export and pseudonymise
run: |
./scripts/export-and-pseudonymise.sh
- name: Commit updated fixtures
run: |
git add test/fixtures/
git commit -m "Automated fixture refresh" || true
git push
Fresh, realistic, PII-free test data every week. No human intervention required.
The Git History Problem
If real PII is already in your Git history, pseudonymising future commits doesn't remove it from past commits. You'll need git filter-branch or BFG Repo-Cleaner to scrub the history. That's a one-time pain, but it's worth doing.
Try It
curl -X POST https://api.comply-tech.co.uk/api/v1/anonymise \
-H "X-Api-Key: demo-key-complytech" \
-H "Content-Type: application/json" \
-d '{
"content": "id,name,email,plan\n1,Sarah Jones,sarah@gmail.com,pro\n2,Tom Wilson,tom@company.co.uk,starter",
"contentType": "csv",
"strategy": "Pseudonymise",
"frameworks": ["GDPR"]
}'
Keep real customer data out of your test pipeline
Try the demo key or get your own API key in minutes.