Tasks

Tasks are the unit of security challenge definition on Dreadnode. A task packages the instruction, environment, and verification rules for a challenge, but it is not itself a runtime session.

What a task contains

A task definition includes:

An instruction for the agent
An environment, usually defined by docker-compose.yaml
Service and port metadata used to expose the environment safely
Verification rules that determine whether the challenge is solved

Tasks are reusable. The same task definition can be run interactively, or as part of a benchmark evaluation.

Task definitions are uploaded as OCI artifacts. The platform validates the bundled task.yaml and compose files, stores the archive, and records the task as pending. The provider-specific template or image is built lazily on first execution, then reused for later runs.

Runtime-facing task metadata

Task API responses can include runtime-facing sandbox metadata:

sandbox_provider: which sandbox backend the task is prepared to run on
sandbox_build_id: the catalog build record associated with the task environment, once a build exists

The sandbox build record is the authoritative place to inspect build lifecycle state such as queued, building, ready, or failed. Tasks remain the source artifact; builds represent the runnable environment derived from that artifact. Newly imported OCI tasks may not have a build record yet.

For interactive solving, the API also exposes a task-instruction rendering endpoint. When a caller supplies a sandbox provider ID, the platform resolves service placeholders such as {{ web_url }} against that sandbox’s reachable URLs and returns the rendered instruction.

Instruction rendering

The tasks API supports fetching rendered instructions when you have a running sandbox. Use GET /org/{org}/tasks/{name}/instruction and pass the sandbox_id query parameter (the provider sandbox identifier) to resolve template variables like {{ service_url }} with live connection details. Without sandbox_id, the endpoint returns the raw instruction template.

What tasks do not own

Tasks do not own:

sandbox lifecycle
attempts or sessions
verification execution
benchmark orchestration
ZIP upload endpoints

Those concerns now live in the execution domains that reference the task definition.

How tasks are executed

Interactive solving

For interactive work, the platform provisions a runtime. Runtimes are exposed through the workspace-scoped runtimes API, while the underlying runtime records remain visible in the sandboxes inventory.

Automated benchmarking

For judged and repeatable runs, the platform creates an evaluation. Each evaluation item combines the task environment with a runtime sandbox, runs the agent inside that runtime, and then executes the task’s verification rules. If the task has never been built for the active sandbox provider, the first run triggers that build before the environment sandbox is provisioned.

Verification

Verification remains part of the task definition. The task defines what success means, but the execution domain performs the actual verification step.