Why seeding Django is annoying
Every Django project hits the same wall once the schema grows past a handful of models. Your options for test data all have a catch:
- JSON/YAML fixtures +
loaddata. They rot. Add a non-null field and every fixture breaks. They are also tiny by nature, nobody hand-writes 5,000 rows. factory_boyfactories. Great for object-level setup inside one test, but you still wire up every relationship by hand, and they are not meant to fill a whole dev or staging database.- A custom
manage.py seedcommand. You end up walking the dependency graph yourself, and it breaks the moment a model changes.
What you actually want is to point something at the models you already have and get a populated database where every foreign key resolves. Here is the whole loop.
Step 1, Point it at your models.py
SeedBase reads your Django models directly. Paste your models.py, or push the whole project in one click from the PyCharm or VS Code plugin. Take a typical shop:
class Customer(models.Model):
name = models.CharField(max_length=120)
email = models.EmailField(unique=True)
city = models.CharField(max_length=80)
class Product(models.Model):
sku = models.CharField(max_length=32, unique=True)
name = models.CharField(max_length=200)
price = models.DecimalField(max_digits=10, decimal_places=2)
class Order(models.Model):
STATUS = [("pending", "Pending"), ("paid", "Paid"), ("shipped", "Shipped")]
customer = models.ForeignKey(Customer, on_delete=models.CASCADE)
status = models.CharField(max_length=16, choices=STATUS)
created_at = models.DateTimeField(auto_now_add=True)
class OrderItem(models.Model):
order = models.ForeignKey(Order, related_name="items", on_delete=models.CASCADE)
product = models.ForeignKey(Product, on_delete=models.PROTECT)
quantity = models.PositiveIntegerField()
The parser understands Django, not just generic SQL:
- The implicit
id. No explicit primary key? The autoAutoFieldidis added for you, exactly like Django would. - Abstract base classes and mixins. Fields on an abstract base merge into every concrete child, so a
TimeStampedModelmixin does the right thing everywhere. - Every relation type.
ForeignKey,OneToOneField, andManyToManyField(including the implicit through table) all become real, resolvable references.ForeignKey("self")becomes a tree, not a forward reference. choices. A field with choices only ever gets one of its real values (pending / paid / shipped), never a random string.
Step 2, Generate
Hit generate. The part that makes Django data actually loadable happens here:
- Topological insert order.
CustomerandProductbeforeOrder,OrderbeforeOrderItem. NoIntegrityErroron load. - Foreign keys resolve. Every
order.customer_idpoints at a customer that was actually generated, and everyOrderItem.product_idat a real product. - Through tables get filled. Many-to-many links reference existing rows on both sides, so the join table is consistent too.
Step 3, Realistic, not just valid
Valid data where every row looks identical still hides bugs. A few things that make it behave like production:
- Skewed distributions. Not every customer has exactly five orders. A long tail (one has two, another nineteen) is where pagination and N+1 queries actually break.
- Coherent values. A customer's email matches their name (
lena.mueller@…for Lena Müller), andorder.totalequals the sum of itsOrderItemrows. Derived values are reconciled after generation. - Localized and coherent. Pick a language and names, cities, postal codes and country line up (Zürich goes with CH, not US). Useful when your
TabularTestCasescreenshots end up in a demo. - Time-aware timestamps.
auto_now_addstays beforeauto_now, and timestamps generate relative to today, so "last 30 days" admin filters stay populated as the dataset ages.
Step 4, Load it into your database
Export as SQL and pipe it straight into your project database, with constraints on, because the inserts are already in FK order:
# straight into the Django-configured database
python manage.py dbshell < seed.sql
# or plain psql
psql "$DATABASE_URL" -f seed.sql
Or skip the file and pull from the CLI, handy in a Makefile or a fresh dev setup:
pip install seedbase
seedbase pull --project <id> --format sql --out seed.sql
You can also connect a database in the UI and push the rows directly, no SQL file at all.
Step 5, Make it reproducible in CI
Generation is deterministic per seed, so a CI run gets the exact same database every time. The Python SDK and pytest plugin pull seeded data straight into your test database, and you can commit the generation config as JSON next to your migrations so the whole team regenerates the same dataset from the same models.
factory_boy is still the right tool. They solve different problems and pair well.
That's the whole loop
models.py in, generate, load. The hard parts (insert ordering, resolving every foreign key, abstract bases, through tables, realistic skew) are handled, so you do not hand-write another seed command. It was tested against a real 20-app Django project with 226 models, which is where most of the edge cases came from. The same approach works for any schema, SQL or Prisma included.
Seed your Django database, free
Paste your models.py or push it from your IDE, generate a populated, FK-consistent database, and load it with manage.py dbshell. Free tier, no credit card.
- Every FK resolves
- Realistic distributions
- SQL / CSV / JSON
- EU-hosted
More: Django test data · vs Faker · Prisma