Start
Autonomously generate a taskset your agent can drop into an eval or training loop. Each task carries a dense rubric and a reproducible environment.
Four steps
- Install the CLI and authenticate, or use the UI.
- Write a Brief describing your agent and the work it does.
- Review the three-sample preview. Approve, or provide feedback and iterate.
- Download the delivered task folder in Harbor, parquet, or jsonl format
Concepts
- Taskset — the folder of tasks delivered at the end. Also called a dataset.
- Task — a single graded unit inside a taskset: an instruction your agent reads, an environment it runs in, and a grader.
- Rubric — the set of binary, weighted criteria each task is graded against.
- Brief — the plain-language description you write to kick off a taskset.
- Preview — three real tasks rendered in the final format, returned for your review before the full run.
- Feedback — freeform prose you send when the preview isn't right. A new preview comes back at the same dataset id.
- Resource — an external data source your agent reads from in production, such as a database or file corpus. A collection of Resources is usually mounted as copies in Sandboxes.
Next
Install the CLI, then walk through generating a taskset end to end in your first taskset.