Motivation

Datagen factory generates data that can be directly used to train your production agent.

Anatomy of an agent

Model

weights

Your Harness

tools · prompts · scaffolding

Your Agent

The model is the weights. The harness is everything around the model — the tools it calls, the system prompts, the way your code wires those tools into the model's loop. In production, those two pieces ship as one unit: an agent.

The harness is usually the part that took longest to get right. If you train a model in a setup where the tools, prompts, or scaffolding don't match what ships, you're not training your agent — you're training a stand-in. Gains there don't reliably transfer back to production.

Intuition

Think of your agent as an F1 driver. The model is the driver. The harness — tools, prompts, scaffolding — is the car: chassis, telemetry, the cockpit tuned to the driver's preferences.

Most training setups today would take the driver out of their car, put them in a generic race sim to train with, then send them back to race day with their actual car. They're faster from the practice, but the brake bite is different, the wheel layout is different, the whole car is different from what the driver has trained with.

Datagen Factory builds race tracks for the driver and their car. We design tracks calibrated to their setup — corners, braking zones, and telemetry that match what the driver will face on race day. On race day, you'll have a better driver who is intimately familiar with their car.

Your agent at each stage

Each stage of Datagen Factory has its direct analog in the F1 metaphor.

Your Agent

Datagen

tasks + rubrics built for your agent

↓

Training

RL rollouts using your agent

↓

Production

same harness · new model weights

Your Driver

Designing the track

calibrated to the driver's setup

↓

Practice laps

in the driver's own car

↓

Race day

same car · sharper lines

The harness threads through unchanged. The model is what improves.

Proof

We used the hand-built version of this methodology to ship state-of-the-art on HealthBench and HealthBench Pro, and Perplexity has used it to produce training data for their production agents. Datagen Factory is that same methodology, made self-serve.

Anatomy of an agent

Intuition

Your agent at each stage

Proof

Next

Start: make your first dataset

Fundamentals: concepts and patterns

CLI reference

UI reference

On this page