Datagen Factory

Claude Code workflow

Point a coding agent at our CLI and let it rip.

This shape works through progressive disclosure. Coding agents use --help to read the list of subcommands, pick the one that matches your ask, and recurse. Every page of help output is self-contained enough that a coding agent can walk from the entry point to any specific command in 2-3 steps.

A transcript

Here's what the loop looks like in practice. The user asks Claude Code for help; Claude Code feels its way through the CLI; the user stays in the loop for critical decisions.

User: I want to make a training dataset for my customer-support agent. It reads our knowledge base and answers billing questions. Can you figure out how to do this with the datagen CLI?

Claude Code:

$ datagen --help

Datagen factory CLI

Commands: auth, config, datasets, resources, health

$ datagen datasets --help

Create and manage datasets

Commands: create, list, get, preview, approve, feedback, watch, download

$ datagen datasets create --help

Options: --brief TEXT (required) — What dataset do you need? --feedback TEXT — Optional refinement note.

Here's what I'd propose for the brief. Tell me if you want to adjust before I run it:

"Customer-support tasks for a billing agent. Each task gives the agent a billing question grounded in a short knowledge-base excerpt, and grades the answer on factual accuracy, citation of the right KB section, tone (professional but not robotic), and whether the agent escalates cases it can't resolve."

User: Good. Run it.

Claude Code:

$ datagen datasets create --brief "Customer-support tasks for a billing agent..."

dataset_id: ds_7f3a9b2c status: Understanding your ask preview_available: false

I'll watch it.

$ datagen datasets watch ds_7f3a9b2c

Understanding your ask Designing your dataset: sketching tasks and rubric criteria Checking quality Preview ready: 3 sample tasks are ready for your review

Preview is ready. Let me pull it.

$ datagen datasets preview ds_7f3a9b2c

(three sample tasks render as panels: instruction, rubric criteria, and the reference solution)

Here's what I'm seeing across the three samples:

  • Instructions are specific: each cites a concrete account scenario and a knowledge-base excerpt.
  • Rubrics are dense (12, 14, 11 criteria) and binary. Each cites the KB section the answer should reference.
  • The third sample's rubric has a criterion that looks redundant with criterion 7 — both check the same refund-window fact.

Do you want to approve as-is, or should I send feedback on the redundancy?

User: Send feedback — also, the tone criteria are too vague. They should name specific things a bad answer would do.

Claude Code:

$ datagen datasets feedback ds_7f3a9b2c "Criteria 7 and 11 in sample 3 test the same refund-window fact — consolidate. Tone criteria across all samples should name specific failure modes (overusing 'apologise for the inconvenience', closing without next steps, using internal jargon like 'CX' or 'ACW') rather than asserting 'professional tone'."

status: Designing your dataset

Sending you another preview shortly.

(Claude Code resumes datagen datasets watch ds_7f3a9b2c; the revised preview arrives; the user approves.)

$ datagen datasets approve ds_7f3a9b2c --yes
$ datagen datasets watch ds_7f3a9b2c

Generating full dataset Complete

$ datagen datasets download ds_7f3a9b2c --output ./datasets/billing-agent-v1

Downloaded 2841344 bytes to ./datasets/billing-agent-v1

The dataset landed in ./datasets/billing-agent-v1 as a folder of Harbor tasks. Ready for your training loop.


Set it up

Install the CLI where your coding agent can see it — a dev-dependency in your repo, or a global install in the shell the agent uses. Then paste the prompt below into your agent.

Create a skill that anchors me to the `datagen` CLI for any training-data work.

First, figure out where this kind of file lives in this project. Common locations:
- Claude Code: `.claude/skills/datagen-cli/SKILL.md`
- Cursor: a new file under `.cursor/rules/`
- Other agents: `AGENTS.md`, `.windsurfrules`, or the equivalent project rules file

If none of those locations exist and the convention isn't obvious, ask me where to put it before writing.

For a Claude Code SKILL.md, use this frontmatter exactly:

```
---
name: datagen-cli
description: Use when the user asks about creating training datasets, evaluation tasks, rubrics, or synthetic data for an agent.
---
```

The description states triggering conditions only — don't summarize the workflow there, or agents will follow the description shortcut and skip the body. For Cursor, Windsurf, or another agent, write the same content as a project rule without frontmatter.

The body should cover:

**The rule.** Use the `datagen` CLI for any training-data work. Read `--help` output before guessing flags. The CLI is built for progressive disclosure: `datagen --help` → `datagen datasets --help` → `datagen datasets create --help`.

**The loop.**
1. Draft the brief with me before submitting.
2. `datagen datasets create --brief "<one-paragraph spec>"` to start the run.
3. `datagen datasets watch <id>` until the preview is ready.
4. `datagen datasets preview <id>` to review the three sample tasks.
5. Either `datagen datasets feedback <id> "<note>"` to refine, or `datagen datasets approve <id> --yes` to confirm.
6. After approval, `datagen datasets watch <id>` until complete.
7. `datagen datasets download <id> --output ./path` to land the dataset.

**User-in-the-loop checkpoints.** Get my input on the brief, the preview review, and the approval decision. Don't approve unilaterally.

**Permission.** The CLI reads and creates only. It doesn't delete without an explicit flag, so run `datagen ...` commands without asking permission for each invocation.

**Source-of-truth docs** (link directly, don't paraphrase):
- `/cli` — command reference
- `/start/first-dataset` — first-run walkthrough
- `/fundamentals/claude-code-workflow` — the full loop with a transcript