How to seed a Django database with realistic test data

Why seeding Django is annoying

Every Django project hits the same wall once the schema grows past a handful of models. Your options for test data all have a catch:

JSON/YAML fixtures + loaddata. They rot. Add a non-null field and every fixture breaks. They are also tiny by nature, nobody hand-writes 5,000 rows.
factory_boy factories. Great for object-level setup inside one test, but you still wire up every relationship by hand, and they are not meant to fill a whole dev or staging database.
A custom manage.py seed command. You end up walking the dependency graph yourself, and it breaks the moment a model changes.

What you actually want is to point something at the models you already have and get a populated database where every foreign key resolves. Here is the whole loop.

Step 1, Point it at your models.py

SeedBase reads your Django models directly. Paste your models.py, or push the whole project in one click from the PyCharm or VS Code plugin. Take a typical shop:

class Customer(models.Model):
    name  = models.CharField(max_length=120)
    email = models.EmailField(unique=True)
    city  = models.CharField(max_length=80)

class Product(models.Model):
    sku   = models.CharField(max_length=32, unique=True)
    name  = models.CharField(max_length=200)
    price = models.DecimalField(max_digits=10, decimal_places=2)

class Order(models.Model):
    STATUS = [("pending", "Pending"), ("paid", "Paid"), ("shipped", "Shipped")]
    customer   = models.ForeignKey(Customer, on_delete=models.CASCADE)
    status     = models.CharField(max_length=16, choices=STATUS)
    created_at = models.DateTimeField(auto_now_add=True)

class OrderItem(models.Model):
    order    = models.ForeignKey(Order, related_name="items", on_delete=models.CASCADE)
    product  = models.ForeignKey(Product, on_delete=models.PROTECT)
    quantity = models.PositiveIntegerField()

The parser understands Django, not just generic SQL:

The implicit id. No explicit primary key? The auto AutoField id is added for you, exactly like Django would.
Abstract base classes and mixins. Fields on an abstract base merge into every concrete child, so a TimeStampedModel mixin does the right thing everywhere.
Every relation type. ForeignKey, OneToOneField, and ManyToManyField (including the implicit through table) all become real, resolvable references. ForeignKey("self") becomes a tree, not a forward reference.
choices. A field with choices only ever gets one of its real values (pending / paid / shipped), never a random string.

Step 2, Generate

Hit generate. The part that makes Django data actually loadable happens here:

Topological insert order. Customer and Product before Order, Order before OrderItem. No IntegrityError on load.
Foreign keys resolve. Every order.customer_id points at a customer that was actually generated, and every OrderItem.product_id at a real product.
Through tables get filled. Many-to-many links reference existing rows on both sides, so the join table is consistent too.

Step 3, Realistic, not just valid

Valid data where every row looks identical still hides bugs. A few things that make it behave like production:

Skewed distributions. Not every customer has exactly five orders. A long tail (one has two, another nineteen) is where pagination and N+1 queries actually break.
Coherent values. A customer's email matches their name (lena.mueller@… for Lena Müller), and order.total equals the sum of its OrderItem rows. Derived values are reconciled after generation.
Localized and coherent. Pick a language and names, cities, postal codes and country line up (Zürich goes with CH, not US). Useful when your TabularTestCase screenshots end up in a demo.
Time-aware timestamps. auto_now_add stays before auto_now, and timestamps generate relative to today, so "last 30 days" admin filters stay populated as the dataset ages.

Step 4, Load it into your database

Export as SQL and pipe it straight into your project database, with constraints on, because the inserts are already in FK order:

# straight into the Django-configured database
python manage.py dbshell < seed.sql

# or plain psql
psql "$DATABASE_URL" -f seed.sql

Or skip the file and pull from the CLI, handy in a Makefile or a fresh dev setup:

pip install seedbase
seedbase pull --project <id> --format sql --out seed.sql

You can also connect a database in the UI and push the rows directly, no SQL file at all.

Step 5, Make it reproducible in CI

Generation is deterministic per seed, so a CI run gets the exact same database every time. The Python SDK and pytest plugin pull seeded data straight into your test database, and you can commit the generation config as JSON next to your migrations so the whole team regenerates the same dataset from the same models.

Where this fits. This fills a dev database, staging, a demo, or a CI container with a coherent dataset straight from your models. For object-level setup inside a single test, factory_boy is still the right tool. They solve different problems and pair well.

That's the whole loop

models.py in, generate, load. The hard parts (insert ordering, resolving every foreign key, abstract bases, through tables, realistic skew) are handled, so you do not hand-write another seed command. It was tested against a real 20-app Django project with 226 models, which is where most of the edge cases came from. The same approach works for any schema, SQL or Prisma included.

Seed your Django database, free

Paste your models.py or push it from your IDE, generate a populated, FK-consistent database, and load it with manage.py dbshell. Free tier, no credit card.

Every FK resolves
Realistic distributions
SQL / CSV / JSON
EU-hosted

Generate Django test data, free

More: Django test data · vs Faker · Prisma