How to generate foreign-key-consistent test data from your schema

The problem with most "fake data"

Reach for a generic data faker and you get realistic-looking values in every column. The trouble starts at the relationships. An order_items row gets a product_id of 4821, but there is no product 4821. Load that into a real database with foreign keys turned on and it stops at the first INSERT:

ERROR:  insert or update on table "order_items" violates
        foreign key constraint "order_items_product_id_fkey"
DETAIL:  Key (product_id)=(4821) is not present in table "products".

So you either disable constraints (and ship data that would never exist in production), or you hand-write a seed script that walks the dependency graph yourself. Both are exactly the work you were trying to avoid.

What you actually want is data that is referentially consistent by construction: children only point at parents that exist, inserts come out in the right order, and the value distributions look like real life. Here is how to get there in a couple of minutes.

Step 1, Get your schema in

SeedBase reads the schema you already have. Pick whichever matches your stack:

SQL (any database)

Paste your CREATE TABLE statements. Inline and table-level REFERENCES are both parsed:

CREATE TABLE products (
  id         SERIAL PRIMARY KEY,
  name       VARCHAR(255) NOT NULL,
  price      NUMERIC(10,2) NOT NULL
);

CREATE TABLE orders (
  id         SERIAL PRIMARY KEY,
  user_id    INTEGER NOT NULL REFERENCES users(id),
  status     VARCHAR(32) NOT NULL,
  total      NUMERIC(10,2) NOT NULL
);

CREATE TABLE order_items (
  id          SERIAL PRIMARY KEY,
  order_id    INTEGER NOT NULL REFERENCES orders(id),
  product_id  INTEGER NOT NULL REFERENCES products(id),
  quantity    INTEGER NOT NULL,
  unit_price  NUMERIC(10,2) NOT NULL
);

Django

Paste your models.py, or push your whole project from the VS Code / PyCharm plugin. The parser follows Django's inheritance: abstract base classes merge their fields into every child, mixins are detected, the implicit id primary key is added, and ForeignKey("self") becomes a self-referencing tree.

Prisma

Paste your schema.prisma. Relation fields and @relation attributes map to foreign keys; scalar fields keep their types.

No schema handy? Start from the built-in e-commerce template and swap in your own tables later. You will still see the full workflow end to end.

Step 2, Generate

Hit generate. Behind the scenes SeedBase does the part that makes the data actually loadable:

Topological insert order. Parents before children, so products and users exist before orders, and orders before order_items.
Foreign keys resolve. Every product_id points at a product that was actually generated, picked from the rows that already exist, not a random integer.
Self-references become trees. A parent_id on the same table references an earlier row; nullable self-FKs get NULL roots. No forward references.

Step 3, Make it realistic, not just valid

Valid data that all looks identical still hides bugs. A few things SeedBase does so the data behaves like production:

Skewed distributions. Not every user has exactly five orders. A long tail means some users have two and some have nineteen, which is where pagination and N+1 queries actually break.
Coherent values. A person's name matches their email (alex.miller@… for Alex Miller), and order.total equals the sum of its line items. Derived and denormalized values are reconciled after generation, not rolled independently.
Sensible ranges. quantity is a small number, not 9,000. Order status comes from pending / paid / shipped / delivered, not a generic enum.
Time-aware timestamps. created_at is always before updated_at, and timestamps generate relative to today, so "last 30 days" dashboards stay populated as the dataset ages.

Step 4, Load it into your database

Export as SQL, CSV, JSON or Parquet, or push straight into a connection. Because the inserts are already in FK order, you load with constraints enabled, no SET session_replication_role tricks:

# Postgres
psql "$DATABASE_URL" -f dataset.postgresql.sql

# MySQL
mysql -u user -p mydb < dataset.mysql.sql

Or generate from the CLI / CI and skip the file entirely:

pip install seedbase
seedbase generate --project <id> --format sql --out dataset.sql

Step 5, Make it reproducible

Generation is deterministic per seed: the same seed produces the same data, so a CI run is reproducible. Export the generation config as JSON and commit it next to your migrations. Anyone on the team regenerates the exact same dataset from the same schema.

Where this fits. This is for filling a dev database, staging, a demo, or a CI container with a coherent dataset. For object-level fixtures inside a single test run, a per-model factory is still the right tool. The two solve different problems.

That's the whole loop

Schema in, generate, load. The hard parts (insert ordering, resolving every foreign key, realistic skew, coherent derived values) are handled so you do not hand-write another seed script. It was tested against a real 20-app Django project with 226 models, which is where most of the edge cases came from.

Try it on your own schema

Paste your SQL, Django models or Prisma schema and generate a populated, FK-consistent database. Free tier, no credit card.

Every FK resolves
Realistic distributions
SQL / CSV / JSON
EU-hosted

Generate test data, free

By stack: Django · Prisma · SQL