Resources and sandboxes

The URL-swap model. How to register a database or a file corpus so every rollout gets a clean, reproducible copy.

Production agents read from systems that won't sit still — a database that changes every minute, a document library that grows every week, an API that returns a different answer each call. Training needs the opposite: the same task, graded the same way, over and over. A Resource is how you bridge the two.

You register a Resource once. Every task in every dataset that uses that Resource gets its own Sandbox — a per-rollout, reproducible copy of the state. The URL your agent uses to reach the Sandbox is the only thing that differs from production. The tools, the client libraries, the queries, the prompts — unchanged.

The drift problem

The agent's tools don't change. Your SQL client still speaks SQL; your file reader still reads files. Only the URL behind the tool call swaps — from production to a per-rollout sandbox — and back again when training is done.

The URL-swap model

Production                                    Training rollout
─────────────────                             ─────────────────
your agent                                    your agent
  │                                             │
  │   PG_URL=prod.orders-db.internal             │   PG_URL=sandbox.orders-db.br_a1b2c3
  ▼                                             ▼
your SQL client ─── SELECT ... ───┐           your SQL client ─── SELECT ... ───┐
                                  │                                             │
                                  ▼                                             ▼
                          production Postgres                          sandboxed Postgres
                          (shared, changing)                           (isolated, reproducible)

The two sides are architecturally identical. Your agent's code doesn't know which side it's on; the environment variable it reads is set by whoever launched it. Production orchestration hands it one URL; we hand it another. Everything downstream is yours.

This is the whole model. The sections below are just how you register each kind of Resource so we know how to provision the right sandbox.

Registering a Postgres Resource

A Postgres Resource needs a schema — the DDL that defines the tables — and optionally some seed data. Every rollout gets a fresh copy of the schema, populated with the seed, isolated from every other rollout.

datagen resources register \
  --kind postgres \
  --name orders_db \
  --schema ./orders-schema.sql \
  --seed ./orders-seed.sql

The --name is the local name your agent will use to refer to the Resource in rollouts. When we hand your agent the sandboxed URL, we key it by this name — so pick something your prompt already uses. Renaming it later means re-registering.

Options for seeding

Three ways to provide seed data, from least to most setup:

TLDC-curated template. Pass --from-template ecommerce_orders (or similar) to start with a curated, industry-typical schema and seed. Useful for prototyping or for agents that work across domains.
Uploaded seed bundle. Pass --seed path/to/seed.sql (or a directory of SQL files). We upload the contents once and use them as the baseline for every rollout.
Your own S3 bucket. Pass --seed-s3 s3://your-bucket/seeds/orders/. We read the contents via cross-account IAM at rollout time. No bytes copied at registration — useful when seed data is large or changes on your side.

No seed is also valid; the sandbox gets an empty schema. Some agents are supposed to populate state from scratch.

What a rollout sees

Every task that declares orders_db as a required Resource gets a URL injected into the task's environment:

PG_URL=postgresql://sandbox.llmdata.com:5432/br_a1b2c3

The hostname is public. The database name varies per rollout. Your agent connects with its normal client, runs its normal SQL, and — because this rollout is isolated from every other rollout — reads the same rows back every time.

Per-rollout isolation is automatic. You don't opt in to it, you can't opt out of it, you don't need to reason about it. Register the Resource once; every task that uses it gets its own clean copy.

The life of one Sandbox

Registration happens once. Provisioning and teardown happen for every rollout — hundreds or thousands of times per dataset, each rollout sealed off from the next.

Registering a file corpus Resource

A file corpus Resource is a folder. The files land inside the task's container at a known path; the agent reads them with its normal file tools.

datagen resources register \
  --kind file_corpus \
  --name docs \
  --source ./my-docs/

Or from S3:

datagen resources register \
  --kind file_corpus \
  --name docs \
  --source s3://your-bucket/research-library/

The --name is again the local name your agent will use. Inside each task's container, the files are mounted at /workspace/<name>/ — so docs ends up at /workspace/docs/. This is Harbor's environment/Dockerfile layout; your agent's file tools read from that path the same way they'd read from any directory.

The corpus is read-only with respect to the original — writes the agent makes are scoped to its rollout and discarded afterwards. If you want the agent to produce files and have them graded, the verifier picks them up from the rollout's workspace.

Listing and updating Resources

List registered Resources

datagen resources list

Inspect a Resource

datagen resources get orders_db

To update the schema or seed of an existing Resource, re-run register with the same --name; the new version takes effect for subsequent datasets. Datasets already in flight continue using the version they started with.

See CLI reference — resources for the full flag set.

What's reproducible and what isn't

State the agent reads	Status
Postgres databases	Reproducible per rollout.
File corpora (local upload or S3 reference)	Reproducible per rollout.
Outbound HTTP calls to third-party APIs	Not sandboxed today. Roadmap item.
SaaS connectors (Salesforce, Gmail, Slack, etc.)	Not sandboxed today. Roadmap item.

For agents that make outbound calls we don't sandbox yet, two options exist: mock the call inside your agent during training, or submit the agent as-is and let the rubric grade around the non-reproducible parts. Neither is great; we're working on it.

When your agent is the whole story

Agents that don't read external state — pure reasoning, writing, or classification — don't need Resources at all. Skip this page.

Agents that read documents but don't touch a database register one file corpus Resource and stop there. Agents that do both register both. The registration surface is the same either way.

Where to go next

CLI reference — resources for every flag on resources register.

On this page