Start

Autonomously generate a taskset your agent can drop into an eval or training loop. Each task carries a dense rubric and a reproducible environment.

Four steps

Taskset — the folder of tasks delivered at the end. Also called a dataset.
Task — a single graded unit inside a taskset: an instruction your agent reads, an environment it runs in, and a grader.
Rubric — the set of binary, weighted criteria each task is graded against.
Brief — the plain-language description you write to kick off a taskset.
Preview — three real tasks rendered in the final format, returned for your review before the full run.
Feedback — freeform prose you send when the preview isn't right. A new preview comes back at the same dataset id.
Resource — an external data source your agent reads from in production, such as a database or file corpus. A collection of Resources is usually mounted as copies in Sandboxes.

Install the CLI, then walk through generating a taskset end to end in your first taskset.