Back to Blog

We Accidentally Logged 6,000 Email Addresses to Datadog

Engineering 6 min read

It started with a support ticket. A customer complained that their account settings weren't saving. One of our engineers added some debug logging to trace the issue:

logger.info(f"Updating user settings: {user_data}")

The bug got fixed. The logging statement stayed. For three months.

During those three months, every time a user updated their settings, their full profile (name, email, phone number) was written to our application logs. Which were shipped to Datadog. Which meant 6,000 customers' email addresses were now sitting in a third-party logging platform, searchable by anyone on the team with Datadog access.

We only found it because someone was debugging an unrelated issue and noticed email addresses scrolling past in the log viewer.

This Is More Common Than Anyone Admits

Talk to any platform team and they'll have a version of this story. The specific details change: sometimes it's credit card numbers in error messages, sometimes it's phone numbers in URL parameters, sometimes it's home addresses in webhook payloads. But the pattern is always the same:

  1. Developer adds logging for a legitimate reason
  2. The logged data happens to contain PII
  3. Nobody notices because nobody is checking log output for PII
  4. Logs get shipped to an external service (Datadog, Splunk, ELK, CloudWatch)
  5. PII is now stored in a third party's infrastructure, often for 30-90 days

Under GDPR, those log entries containing personal data are personal data. They're subject to data minimisation, purpose limitation, and storage limitation. Shipping them to a third-party logging service without a legal basis or DPA covering that specific processing is a compliance gap.

The Solutions Nobody Likes

"Just train developers to not log PII." Developers don't intentionally log PII. It happens because they log objects, dictionaries, request bodies, and error contexts that happen to contain PII. You can't code-review your way out of this because the PII isn't visible in the logging statement; it's in the runtime data.

"Use structured logging and exclude sensitive fields." This works if you know in advance which fields are sensitive. But PII shows up in freeform text fields, error messages, URLs, and metadata where you don't expect it.

"Scrub logs after the fact." Retroactive scrubbing is expensive, error-prone, and doesn't help if the logs have already been shipped to a third party.

What Actually Works

Strip PII from log entries before they leave your infrastructure. Not after: before.

We added a sanitisation step to our logging pipeline. Every log entry passes through ComplyTech's API before it hits Datadog:

import requests
import logging

class PiiSanitizingHandler(logging.Handler):
    def emit(self, record):
        clean_message = self._strip_pii(self.format(record))
        # Forward clean_message to your log shipper

    def _strip_pii(self, text):
        response = requests.post(
            "https://api.comply-tech.co.uk/api/v1/anonymise",
            headers={"X-Api-Key": "your-api-key", "Content-Type": "application/json"},
            json={
                "content": text,
                "contentType": "text",
                "strategy": "Redact",
                "frameworks": ["GDPR"]
            }
        )
        return response.json()["anonymisedContent"]

Now when a developer accidentally logs a user object, what reaches Datadog looks like:

Updating user settings: {name: [NAME REDACTED], email: [EMAIL REDACTED], theme: "dark", notifications: true}

The debugging context (theme, notifications) survives. The PII doesn't.

Performance Considerations

The obvious concern is latency. You don't want a PII detection API call blocking your application's hot path.

Two approaches that work:

Async processing: Buffer log entries and send them in batches. ComplyTech's batch API handles up to 100 items per request. Ship sanitised logs every 5-10 seconds rather than per-entry.

Sidecar pattern: Run the sanitisation as a separate process that reads from your log buffer. Your application writes logs at full speed; the sidecar sanitises and forwards them.

Sub-100ms per API call means even synchronous processing adds negligible latency for most logging volumes.

What We Didn't Solve

This approach catches PII in log content. It doesn't prevent PII from appearing in log metadata, such as request URLs that contain email parameters, or trace IDs that embed user identifiers. For those, you still need application-level URL sanitisation.

It also doesn't help with logs that are already in Datadog. For historical log scrubbing, you'd need to work with your logging provider directly.

Test It

curl -X POST https://api.comply-tech.co.uk/api/v1/anonymise \
  -H "X-Api-Key: demo-key-complytech" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "ERROR: Failed to update settings for user sarah.mitchell@gmail.com (id: 4829). Stack trace: NullReferenceException at UserService.UpdateProfile(User{Name=Sarah Mitchell, Phone=07700900123})",
    "contentType": "text",
    "strategy": "Redact",
    "frameworks": ["GDPR"]
  }'

Keep PII out of your logs

Try the demo key or get your own API key in minutes.