Your first taskset

End-to-end datagen — write a Brief, review the preview, and download a folder of Harbor tasks.

Prefer the UI? See creating a taskset in the UI.

Write a Brief

A Brief is plain English. Say what your agent does, what the training data should teach it, and anything you already know about the shape of the work.

I'm training a research agent that answers questions about SEC 10-K filings. Each task should pose a multi-step question where the answer requires reading at least two sections of a filing and synthesizing across them. Grading should check that the agent cites the section it pulled each fact from, not just the final answer.

Terse briefs are okay! The preview is meant to surface and address any gaps. We'll handle building the rubric.

Create the taskset

datagen datasets create --brief "I'm training a research agent that answers questions about SEC 10-K filings..."

The CLI returns a dataset_id. You'll use this id for every follow-up command.

Follow the progress

datagen datasets watch <dataset_id>

This streams progress waypoints as they happen. You'll see:

Understanding your ask — we're reading the Brief and mapping it to concrete tasks.
Designing your dataset — we're laying out what the tasks will look like, drafting rubrics, sizing the scope.
Preview ready — the three-sample preview is built and waiting for you.
Your review — the watch command returns at this waypoint; your turn.

watch returns when it's your turn — when the preview is ready, when the full taskset is complete, or when your Brief needs more attention. Rerun any time to pick up where you left off.

Inspect the preview

datagen datasets preview <dataset_id>

Returns three samples. Each one is a complete Harbor task directory — the same format the final taskset will deliver.

Read reviewing your preview for a walksthrough of what to look at in each file.

Approve, or give feedback

If the preview is good:

datagen datasets approve <dataset_id>

You'll be asked to confirm. Pass --yes to skip the prompt in a script.

If something is off, say what should change:

datagen datasets feedback <dataset_id> "the rubric criteria about regulatory accuracy should cite the 10-K section, not just assert facts"

Good feedback is specific. Point at the criterion, file, or behavior you want changed, and describe why. One round of feedback triggers a new preview at the same id; iterate until it's right.

Download

After approval, generation begins. When datagen datasets watch returns complete or datagen datasets get reports the taskset is ready, the taskset will be available to download:

datagen datasets download <dataset_id> --output ./my-taskset

The delivered artifact is a folder, one sub-folder per task, each sub-folder a Harbor task directory:

Drop the folder into your Harbor-compatible training loop. Every task is a self-contained unit you can run with harbor task run task-0001-filing-synthesis.

If you prefer a flat summary instead, --file-format jsonl formats each task as one line of json.

Congratulations

You've just produced a taskset of Harbor tasks, without writing dense rubrics yourself.

Review your preview — get better at the approve/feedback step.
Resources and sandboxes — what your agent reads from when it needs external state.
Task — field-by-field walk of a delivered task directory.