The problem with most "fake data"
Reach for a generic data faker and you get realistic-looking values in every column. The trouble starts at the relationships. An order_items row gets a product_id of 4821, but there is no product 4821. Load that into a real database with foreign keys turned on and it stops at the first INSERT:
ERROR: insert or update on table "order_items" violates
foreign key constraint "order_items_product_id_fkey"
DETAIL: Key (product_id)=(4821) is not present in table "products".
So you either disable constraints (and ship data that would never exist in production), or you hand-write a seed script that walks the dependency graph yourself. Both are exactly the work you were trying to avoid.
What you actually want is data that is referentially consistent by construction: children only point at parents that exist, inserts come out in the right order, and the value distributions look like real life. Here is how to get there in a couple of minutes.
Step 1, Get your schema in
SeedBase reads the schema you already have. Pick whichever matches your stack:
SQL (any database)
Paste your CREATE TABLE statements. Inline and table-level REFERENCES are both parsed:
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
price NUMERIC(10,2) NOT NULL
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INTEGER NOT NULL REFERENCES users(id),
status VARCHAR(32) NOT NULL,
total NUMERIC(10,2) NOT NULL
);
CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INTEGER NOT NULL REFERENCES orders(id),
product_id INTEGER NOT NULL REFERENCES products(id),
quantity INTEGER NOT NULL,
unit_price NUMERIC(10,2) NOT NULL
);
Django
Paste your models.py, or push your whole project from the VS Code / PyCharm plugin. The parser follows Django's inheritance: abstract base classes merge their fields into every child, mixins are detected, the implicit id primary key is added, and ForeignKey("self") becomes a self-referencing tree.
Prisma
Paste your schema.prisma. Relation fields and @relation attributes map to foreign keys; scalar fields keep their types.
Step 2, Generate
Hit generate. Behind the scenes SeedBase does the part that makes the data actually loadable:
- Topological insert order. Parents before children, so
productsandusersexist beforeorders, andordersbeforeorder_items. - Foreign keys resolve. Every
product_idpoints at a product that was actually generated, picked from the rows that already exist, not a random integer. - Self-references become trees. A
parent_idon the same table references an earlier row; nullable self-FKs get NULL roots. No forward references.
Step 3, Make it realistic, not just valid
Valid data that all looks identical still hides bugs. A few things SeedBase does so the data behaves like production:
- Skewed distributions. Not every user has exactly five orders. A long tail means some users have two and some have nineteen, which is where pagination and N+1 queries actually break.
- Coherent values. A person's name matches their email (
alex.miller@…for Alex Miller), andorder.totalequals the sum of its line items. Derived and denormalized values are reconciled after generation, not rolled independently. - Sensible ranges.
quantityis a small number, not 9,000. Order status comes frompending / paid / shipped / delivered, not a generic enum. - Time-aware timestamps.
created_atis always beforeupdated_at, and timestamps generate relative to today, so "last 30 days" dashboards stay populated as the dataset ages.
Step 4, Load it into your database
Export as SQL, CSV, JSON or Parquet, or push straight into a connection. Because the inserts are already in FK order, you load with constraints enabled, no SET session_replication_role tricks:
# Postgres
psql "$DATABASE_URL" -f dataset.postgresql.sql
# MySQL
mysql -u user -p mydb < dataset.mysql.sql
Or generate from the CLI / CI and skip the file entirely:
pip install seedbase
seedbase generate --project <id> --format sql --out dataset.sql
Step 5, Make it reproducible
Generation is deterministic per seed: the same seed produces the same data, so a CI run is reproducible. Export the generation config as JSON and commit it next to your migrations. Anyone on the team regenerates the exact same dataset from the same schema.
That's the whole loop
Schema in, generate, load. The hard parts (insert ordering, resolving every foreign key, realistic skew, coherent derived values) are handled so you do not hand-write another seed script. It was tested against a real 20-app Django project with 226 models, which is where most of the edge cases came from.
Try it on your own schema
Paste your SQL, Django models or Prisma schema and generate a populated, FK-consistent database. Free tier, no credit card.
- Every FK resolves
- Realistic distributions
- SQL / CSV / JSON
- EU-hosted