Motivation
Datagen factory generates data that can be directly used to train your production agent.
Anatomy of an agent
The model is the weights. The harness is everything around the model — the tools it calls, the system prompts, the way your code wires those tools into the model's loop. In production, those two pieces ship as one unit: an agent.
The harness is usually the part that took longest to get right. If you train a model in a setup where the tools, prompts, or scaffolding don't match what ships, you're not training your agent — you're training a stand-in. Gains there don't reliably transfer back to production.
Intuition
Think of your agent as an F1 driver. The model is the driver. The harness — tools, prompts, scaffolding — is the car: chassis, telemetry, the cockpit tuned to the driver's preferences.
Most training setups today would take the driver out of their car, put them in a generic race sim to train with, then send them back to race day with their actual car. They're faster from the practice, but the brake bite is different, the wheel layout is different, the whole car is different from what the driver has trained with.
Datagen Factory builds race tracks for the driver and their car. We design tracks calibrated to their setup — corners, braking zones, and telemetry that match what the driver will face on race day. On race day, you'll have a better driver who is intimately familiar with their car.
Your agent at each stage
Each stage of Datagen Factory has its direct analog in the F1 metaphor.
The harness threads through unchanged. The model is what improves.
Proof
We used the hand-built version of this methodology to ship state-of-the-art on HealthBench and HealthBench Pro, and Perplexity has used it to produce training data for their production agents. Datagen Factory is that same methodology, made self-serve.
Next
Start: make your first dataset
A walkthrough from install to delivered folder. ~30 minutes for the simplest run; longer if your agent needs a sandbox.
Fundamentals: concepts and patterns
What tasks, rubrics, and sandboxes are; why each one matters; how to integrate your agent; what makes a rubric training-grade.
CLI reference
Every command, every flag. The CLI is the primary surface and the one we recommend pointing a coding agent at.
UI reference
The web app. Same features as the CLI, but better for reviewing previews and writing feedback.