Stop Putting Real Customer Emails in Your CI Pipeline

Last week I was reviewing a pull request and noticed something in the CI logs. Our integration test suite was making API calls with real customer data. Not test fixtures, but actual production customer records.

How did this happen? Simple. The integration tests needed realistic data. Someone pulled 500 rows from the production database, committed them as a test fixture file, and pushed to the repo.

Those 500 real customers' names, emails, and phone numbers were now:

In our Git history (forever)
In CI logs for every test run (visible to everyone on the team)
In artefact storage (retained for 90 days)
In any branch that was forked after the commit

This isn't a theoretical concern. It's a data breach. 500 people's personal data was exposed to a system (CI infrastructure) that has no business holding it.

The Quick Fix That Creates the Problem

The usual progression:

Team writes integration tests that need realistic data
Someone creates test fixtures from production data
The fixtures get committed to the repo
CI runs the tests and logs outputs containing real PII
Nobody thinks about this until an audit or incident

The "fix" people try first: replace the test fixture data manually. But this needs to happen every time someone refreshes the fixtures. Which is exactly the kind of manual process that humans forget to do.

The Actual Fix

Generate your test fixtures by running real data through ComplyTech's pseudonymisation, then commit the pseudonymised output:

# Pull fresh data from production
pg_dump --table=customers --data-only prod_db > customers.csv

# Pseudonymise it
curl -X POST https://api.comply-tech.co.uk/api/v1/anonymise \
  -H "X-Api-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d "{
    \"content\": \"$(cat customers.csv)\",
    \"contentType\": \"csv\",
    \"strategy\": \"Pseudonymise\",
    \"frameworks\": [\"GDPR\"]
  }" | jq -r '.anonymisedContent' > test/fixtures/customers.csv

# Commit the clean version
git add test/fixtures/customers.csv
git commit -m "Refresh test fixtures (pseudonymised)"

The test fixtures look real. They have the edge cases and data patterns your tests need. But they don't contain actual customer data.

Because the pseudonymisation is deterministic, you can refresh fixtures regularly and your tests won't break; the same production customer always maps to the same fake identity.

Add It to Your CI Pipeline

Better yet, automate the fixture refresh as a CI step that runs weekly:

# .github/workflows/refresh-fixtures.yml
name: Refresh Test Fixtures
on:
  schedule:
    - cron: '0 6 * * 1'  # Every Monday at 6am
jobs:
  refresh:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Export and pseudonymise
        run: |
          ./scripts/export-and-pseudonymise.sh
      - name: Commit updated fixtures
        run: |
          git add test/fixtures/
          git commit -m "Automated fixture refresh" || true
          git push

Fresh, realistic, PII-free test data every week. No human intervention required.

The Git History Problem

If real PII is already in your Git history, pseudonymising future commits doesn't remove it from past commits. You'll need git filter-branch or BFG Repo-Cleaner to scrub the history. That's a one-time pain, but it's worth doing.

Try It