Training

Submit SFT jobs

Use dn train sft to submit a hosted Tinker supervised fine-tuning job against an uploaded dataset and capability:

dn train sft \
  --server http://127.0.0.1:8000 \
  --api-key "$DREADNODE_API_KEY" \
  --organization dreadnode \
  --workspace localdev \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --capability [email protected] \
  --dataset [email protected] \
  --steps 100 \
  --wait \
  --json

You can also train directly from one or more published Worlds trajectory datasets:

dn train sft \
  --server http://127.0.0.1:8000 \
  --api-key "$DREADNODE_API_KEY" \
  --organization dreadnode \
  --workspace localdev \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --capability [email protected] \
  --trajectory-dataset dreadnode/[email protected] \
  --trajectory-dataset dreadnode/[email protected] \
  --steps 50

Common flags:

Flag	Description
`--trajectory-dataset NAME@VERSION`	Worlds trajectory dataset input, repeatable
`--eval-dataset NAME@VERSION`	Optional evaluation dataset
`--batch-size <n>`	Per-step batch size
`--gradient-accumulation-steps <n>`	Gradient accumulation factor
`--learning-rate <float>`	Optimizer learning rate
`--checkpoint-interval <n>`	Save checkpoint every N steps
`--wait`	Poll until the hosted job reaches a terminal state
`--json`	Print the full job payload instead of a compact summary

Submit RL jobs

Use dn train rl to submit a hosted Tinker reinforcement learning job:

dn train rl \
  --server http://127.0.0.1:8000 \
  --api-key "$DREADNODE_API_KEY" \
  --organization dreadnode \
  --workspace localdev \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --capability [email protected] \
  --task [email protected] \
  --prompt-dataset [email protected] \
  --algorithm importance_sampling \
  --execution-mode fully_async \
  --max-steps-off-policy 3 \
  --reward-recipe contains_v1 \
  --reward-params '{"needle":"flag"}'

For Worlds-driven offline RL, replace the prompt dataset input with one or more published trajectory datasets:

dn train rl \
  --server http://127.0.0.1:8000 \
  --api-key "$DREADNODE_API_KEY" \
  --organization dreadnode \
  --workspace localdev \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --capability [email protected] \
  --trajectory-dataset dreadnode/[email protected] \
  --trajectory-dataset dreadnode/[email protected] \
  --algorithm importance_sampling

When dn train rl runs from trajectory datasets without an explicit reward recipe, the sandbox uses the trajectory_imitation_v1 baseline. That recipe only rewards completions that match the recorded next assistant action, and it scales that reward by the published trajectory outcome metadata from Worlds.

For Worlds-first RL, point the job at a manifest plus the runtime that should generate native agent trajectories. The control plane samples and publishes a Worlds dataset first, then the RL sandbox trains from that published dataset:

dn train rl \
  --server http://127.0.0.1:8000 \
  --api-key "$DREADNODE_API_KEY" \
  --organization dreadnode \
  --workspace localdev \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --capability dreadnode/[email protected] \
  --world-manifest-id c8af2b7b-9b54-4b21-95a9-b8d403cd8c11 \
  --world-runtime-id 8b8fd3af-9a5e-47c8-9f67-7b87ca9387eb \
  --world-agent-name operator \
  --world-goal "Escalate to Domain Admin in corp.local" \
  --execution-mode fully_async \
  --max-steps-off-policy 3 \
  --num-rollouts 8

When --world-runtime-id is supplied, hosted RL treats Worlds-published native-agent datasets as the primary input path. The selected runtime and capability generate trajectories in Worlds first, and the training sandbox then reuses the existing offline/async RL runtime over the published dataset.

If you also supply --world-reward, the job falls back to the older live-backend rollout bridge so the SDK can apply that reward policy directly during rollout generation.

Common RL flags:

Flag	Description
`--trajectory-dataset REF`	Worlds trajectory dataset input, repeatable
`--world-manifest-id ID`	Live Worlds manifest target for online RL
`--world-runtime-id ID`	Runtime used to sample native-agent Worlds trajectories
`--world-agent-name NAME`	Optional agent selected within the runtime capability
`--world-goal TEXT`	Optional goal override for the live Worlds rollout agent
`--world-reward NAME`	Named live Worlds reward policy
`--world-reward-params JSON`	JSON params passed to the selected Worlds reward policy
`--prompt-split <name>`	Prompt split selector inside the prompt dataset
`--execution-mode <mode>`	RL runtime mode: `sync`, `one_step_off_async`, or `fully_async`
`--steps <n>`	Number of optimization steps
`--num-rollouts <n>`	Number of rollouts per update
`--max-turns <n>`	Maximum turns per episode
`--max-episode-steps <n>`	Environment step limit
`--weight-sync-interval <n>`	Refresh sampler weights every N updates
`--max-steps-off-policy <n>`	Max rollout staleness for async RL; `one_step_off_async` requires `1`
`--stop <token>`	Add a stop token, repeatable

Hosted Tinker RL now supports two async modes:

one_step_off_async keeps one rollout group in flight and bounds staleness to one step
fully_async widens the same pipeline to multiple queued rollout groups with bounded staleness controlled by --max-steps-off-policy
both modes still operate on rollout groups, not partial in-flight episode continuation

--task and --prompt-dataset are optional for Worlds-driven offline RL. --world-manifest-id plus --world-runtime-id is the native-agent alternative when you want the control plane to generate and publish fresh Worlds trajectories before training. --world-reward keeps the older live-rollout bridge available when you explicitly want reward shaping during rollout generation.

Inspect and manage jobs

The training subcommands also expose job management:

dn train get <job-id>
dn train wait <job-id> --json
dn train logs <job-id>
dn train artifacts <job-id>
dn train cancel <job-id> --json

dn train wait exits non-zero if the job finishes in failed or cancelled.

Platform context

Hosted training commands require platform credentials plus an active organization and workspace. Pass them explicitly with flags or configure them in your SDK profile first:

dn configure
dn train get <job-id>