Datagen Factory

Motivation

Datagen factory generates data that can be directly used to train your production agent.

Anatomy of an agent

Model
weights
+
Your Harness
tools · prompts · scaffolding
=
Your Agent

The model is the weights. The harness is everything around the model — the tools it calls, the system prompts, the way your code wires those tools into the model's loop. In production, those two pieces ship as one unit: an agent.

The harness is usually the part that took longest to get right. If you train a model in a setup where the tools, prompts, or scaffolding don't match what ships, you're not training your agent — you're training a stand-in. Gains there don't reliably transfer back to production.

Intuition

Think of your agent as an F1 driver. The model is the driver. The harness — tools, prompts, scaffolding — is the car: chassis, telemetry, the cockpit tuned to the driver's preferences.

Most training setups today would take the driver out of their car, put them in a generic race sim to train with, then send them back to race day with their actual car. They're faster from the practice, but the brake bite is different, the wheel layout is different, the whole car is different from what the driver has trained with.

Datagen Factory builds race tracks for the driver and their car. We design tracks calibrated to their setup — corners, braking zones, and telemetry that match what the driver will face on race day. On race day, you'll have a better driver who is intimately familiar with their car.

Your agent at each stage

Each stage of Datagen Factory has its direct analog in the F1 metaphor.

Your Agent
Datagen
tasks + rubrics built for your agent
Training
RL rollouts using your agent
Production
same harness · new model weights
Your Driver
Designing the track
calibrated to the driver's setup
Practice laps
in the driver's own car
Race day
same car · sharper lines

The harness threads through unchanged. The model is what improves.

Proof

We used the hand-built version of this methodology to ship state-of-the-art on HealthBench and HealthBench Pro, and Perplexity has used it to produce training data for their production agents. Datagen Factory is that same methodology, made self-serve.

Next