Tutorial

How to generate foreign-key-consistent test data from your schema

Random fake data fills columns. It does not respect relationships. Here is how to get a database where every foreign key actually resolves, from a SQL, Django or Prisma schema, loadable into Postgres or MySQL without a single constraint error.

SeedBase · ~6 min read

The problem with most "fake data"

Reach for a generic data faker and you get realistic-looking values in every column. The trouble starts at the relationships. An order_items row gets a product_id of 4821, but there is no product 4821. Load that into a real database with foreign keys turned on and it stops at the first INSERT:

ERROR:  insert or update on table "order_items" violates
        foreign key constraint "order_items_product_id_fkey"
DETAIL:  Key (product_id)=(4821) is not present in table "products".

So you either disable constraints (and ship data that would never exist in production), or you hand-write a seed script that walks the dependency graph yourself. Both are exactly the work you were trying to avoid.

What you actually want is data that is referentially consistent by construction: children only point at parents that exist, inserts come out in the right order, and the value distributions look like real life. Here is how to get there in a couple of minutes.

Step 1, Get your schema in

SeedBase reads the schema you already have. Pick whichever matches your stack:

SQL (any database)

Paste your CREATE TABLE statements. Inline and table-level REFERENCES are both parsed:

CREATE TABLE products (
  id         SERIAL PRIMARY KEY,
  name       VARCHAR(255) NOT NULL,
  price      NUMERIC(10,2) NOT NULL
);

CREATE TABLE orders (
  id         SERIAL PRIMARY KEY,
  user_id    INTEGER NOT NULL REFERENCES users(id),
  status     VARCHAR(32) NOT NULL,
  total      NUMERIC(10,2) NOT NULL
);

CREATE TABLE order_items (
  id          SERIAL PRIMARY KEY,
  order_id    INTEGER NOT NULL REFERENCES orders(id),
  product_id  INTEGER NOT NULL REFERENCES products(id),
  quantity    INTEGER NOT NULL,
  unit_price  NUMERIC(10,2) NOT NULL
);

Django

Paste your models.py, or push your whole project from the VS Code / PyCharm plugin. The parser follows Django's inheritance: abstract base classes merge their fields into every child, mixins are detected, the implicit id primary key is added, and ForeignKey("self") becomes a self-referencing tree.

Prisma

Paste your schema.prisma. Relation fields and @relation attributes map to foreign keys; scalar fields keep their types.

No schema handy? Start from the built-in e-commerce template and swap in your own tables later. You will still see the full workflow end to end.

Step 2, Generate

Hit generate. Behind the scenes SeedBase does the part that makes the data actually loadable:

Step 3, Make it realistic, not just valid

Valid data that all looks identical still hides bugs. A few things SeedBase does so the data behaves like production:

Step 4, Load it into your database

Export as SQL, CSV, JSON or Parquet, or push straight into a connection. Because the inserts are already in FK order, you load with constraints enabled, no SET session_replication_role tricks:

# Postgres
psql "$DATABASE_URL" -f dataset.postgresql.sql

# MySQL
mysql -u user -p mydb < dataset.mysql.sql

Or generate from the CLI / CI and skip the file entirely:

pip install seedbase
seedbase generate --project <id> --format sql --out dataset.sql

Step 5, Make it reproducible

Generation is deterministic per seed: the same seed produces the same data, so a CI run is reproducible. Export the generation config as JSON and commit it next to your migrations. Anyone on the team regenerates the exact same dataset from the same schema.

Where this fits. This is for filling a dev database, staging, a demo, or a CI container with a coherent dataset. For object-level fixtures inside a single test run, a per-model factory is still the right tool. The two solve different problems.

That's the whole loop

Schema in, generate, load. The hard parts (insert ordering, resolving every foreign key, realistic skew, coherent derived values) are handled so you do not hand-write another seed script. It was tested against a real 20-app Django project with 226 models, which is where most of the edge cases came from.

Try it on your own schema

Paste your SQL, Django models or Prisma schema and generate a populated, FK-consistent database. Free tier, no credit card.

  • Every FK resolves
  • Realistic distributions
  • SQL / CSV / JSON
  • EU-hosted
Generate test data, free

By stack: Django · Prisma · SQL