Datagen Factory

Start

Autonomously generate a taskset your agent can drop into an eval or training loop. Each task carries a dense rubric and a reproducible environment.

Four steps

  1. Install the CLI and authenticate, or use the UI.
  2. Write a Brief describing your agent and the work it does.
  3. Review the three-sample preview. Approve, or provide feedback and iterate.
  4. Download the delivered task folder in Harbor, parquet, or jsonl format

Concepts

  • Taskset — the folder of tasks delivered at the end. Also called a dataset.
  • Task — a single graded unit inside a taskset: an instruction your agent reads, an environment it runs in, and a grader.
  • Rubric — the set of binary, weighted criteria each task is graded against.
  • Brief — the plain-language description you write to kick off a taskset.
  • Preview — three real tasks rendered in the final format, returned for your review before the full run.
  • Feedback — freeform prose you send when the preview isn't right. A new preview comes back at the same dataset id.
  • Resource — an external data source your agent reads from in production, such as a database or file corpus. A collection of Resources is usually mounted as copies in Sandboxes.

Next

Install the CLI, then walk through generating a taskset end to end in your first taskset.