PII masking · subsetting · synthetic data

Anonymize production data for safe dev, staging and CI.

Real customer data does not belong on developer laptops, in CI volumes or in old backups. SeedBase detects personal data automatically and masks it in a consistent, format-preserving way, or generates fully synthetic test data so no copied production rows leave your system in the first place.

Anonymize data, free How it works

Free tier · no card · EU-hosted · no third-party CDNs

The problem every team knows

"Just grab a quick prod dump," and suddenly real emails, names, IBANs and addresses are sitting on laptops, in shared CI volumes and in staging databases that nobody secured as carefully as production.

Every copy is a new exposure

Dev and staging copies carry the same sensitivity as production, usually with weaker protection and far more people who can read them. GDPR, CCPA and HIPAA all care about who can access real personal data.

Contractors and freelancers

The moment an agency or contractor works with prod data, the risk travels with every dump, onto machines you do not control and cannot wipe.

Forgotten backups and CI artifacts

An old staging snapshot or a cached CI volume with real PII is the breach you only hear about later. Data minimization and pseudonymization are the established answers.

How SeedBase solves it

Two approaches, depending on what you need, and they combine.

PII detection

Find personal columns automatically

Detection works on column names and value patterns (email, IBAN, phone, address and more), optionally AI-assisted. The result is a report with suggestions that you review and override before anything is changed.

masking

Format-preserving, consistent replacement

An email stays an email, an IBAN keeps its prefix and length. The same original value maps to the same replacement across the whole project, so joins, group-bys and tests keep working. Optionally in-place in your database, with no copy stored at SeedBase.

subsetting

Data minimization built in

Instead of the full database, take a referentially consistent slice, a realistic 1% for local dev and CI with every foreign key intact. Less data, less risk, smaller dumps.

synthetic

Or: no real data at all

SeedBase generates fully synthetic, FK-consistent test data from your schema with realistic distributions. Where there was never any real personal data, there is nothing to re-identify.

Honest claims, not marketing: heuristic PII detection is not exhaustive, free-text fields with names sprinkled inside can slip past it. Masking is pseudonymization depending on how it is applied, not automatically anonymization in the legal sense. SeedBase therefore emits a privacy report per run as a working basis for your privacy or compliance lead. It supports GDPR-conscious workflows; it is not a compliance certificate. Details: anonymization in the docs.

How the masking works

Read a SQL dump or connect to a live database, classify the columns, mask, and export, or write straight back in place. Here is a users table before and after, with consistent, format-preserving replacement.

before → after

# before (the same customer email lives in two tables)
users   | id | email                  | name        | phone
        |  1 | anna.mueller@gmail.com | Anna Müller | +49 151 23456789
orders  | id | customer_email         | total
        | 77 | anna.mueller@gmail.com | 49.90

# after (format-preserving, same value -> same replacement, so the join still holds)
users   | id | email                  | name           | phone
        |  1 | kara.lindqvist@mail.io | Kara Lindqvist | +49 151 80042317
orders  | id | customer_email         | total
        | 77 | kara.lindqvist@mail.io | 49.90

# CLI
$ seedbase mask --source postgres://app@db/prod \
                --report privacy-report.json
# → 14 PII columns detected, 9 masked, 5 skipped (review report)

The same clear value always becomes the same masked value, across rows and across tables, so foreign keys and joins still line up. Optional differential privacy adds calibrated noise to numeric and aggregate columns where you need formal guarantees.

From prod dump to clean staging in three steps

Self-service in the web app, via the CLI, or straight from VS Code and PyCharm.

Connect a database or import a schema

PostgreSQL, MySQL and more, or a schema from a SQL dump, Django models or Prisma. Credentials are stored encrypted.

Review the PII report

SeedBase proposes the personal columns it detected, each with a transform. You confirm, add or exclude columns before anything runs.

Mask or generate

Mask in place in the target database, pull a masked subset dump, or generate fully synthetic data and load it into dev and CI via the CLI or a plugin.

Which safeguard maps to what

For orientation, not legal advice. Talk to your privacy or compliance lead about your dataset.

Safeguard	What it covers
Format-preserving, consistent masking	A common pseudonymization measure. Original values are replaced deterministically without exposing the source value, while emails, IBANs and phone numbers keep their shape.
Subsetting	Data minimization in practice: only the slice you actually need leaves production, with referential integrity intact.
Fully synthetic data	No personal data from the start. There is no real person behind a synthetic row, so there is nothing to re-identify.
Privacy report	Accountability: documents per run what was masked and how value distributions shifted, useful for GDPR, CCPA and HIPAA conversations.
EU-hosting, no third-party CDNs	The platform itself loads no third-party resources and adds no cross-region transfer of your data through its frontend.

No sales call. Just do it.

There is no "book a demo" wall and no sales conversation: create an account, connect a database or import a schema, review the PII report, anonymize. The free tier is enough to try it for real, and you only pay once it is worth it.

Up and running in under 5 minutes
No credit card to start
Repeatable as an automated job

Start free

German version: DSGVO-Anonymisierung · Prefer synthetic? Test data from SQL · Compare: vs Tonic.ai

Tested on a real project. The detection and FK-consistent masking were exercised against a real 20-app project with 226 tables, so the consistency holds where joins and references actually get complicated.

FAQ

Can we use production data in dev and staging environments?

Real personal data in dev and staging is still processing of personal data and raises the risk: more copies, more people with access, usually weaker protection than production. Privacy regimes like GDPR, CCPA and HIPAA expect risk-appropriate safeguards, and pseudonymization and data minimization are common answers. Masked or synthetic data reduces that exposure substantially, which is why most teams stop shipping raw prod dumps to laptops and CI volumes.

Is masked data the same as anonymous data?

Not automatically. Consistent, format-preserving masking replaces the original values deterministically and is pseudonymization or anonymization depending on how it is applied. Whether a re-identification risk remains depends on the dataset and should be assessed with your privacy or compliance lead. SeedBase emits a privacy report per run as a working basis. If you want to rule out any link to real people, use fully synthetic data.

Is the data still useful for testing after masking?

Yes, that is the whole point. An email stays a valid email shape, an IBAN keeps its country prefix and length, phone numbers keep their structure. The same original value maps to the same replacement across every table, so joins and group-bys keep working. On request, the frequency distribution of categorical values is preserved too.

Where is the data processed?

SeedBase is EU-hosted. The app loads no resources from third-party CDNs. In-place masking writes straight back into your database; no copy of your data is stored at SeedBase. During computation, values pass transiently through the EU-hosted servers (as a processor). To avoid even that, generate fully synthetic data from the schema, which never needs your real rows at all.

What does it cost?

Self-service: free to start, with Pro and Team plans on the pricing page. No sales contact needed.