Skip to content

Tasks

Tasks are the unit of security challenge definition on Dreadnode. A task packages the instruction, environment, and verification rules for a challenge, but it is not itself a runtime session.

A task definition includes:

  • An instruction for the agent
  • An environment, usually defined by docker-compose.yaml
  • Service and port metadata used to expose the environment safely
  • Verification rules that determine whether the challenge is solved

Tasks are reusable. The same task definition can be run interactively, or as part of a benchmark evaluation.

Task definitions are uploaded as OCI artifacts. The platform validates the bundled task.yaml and compose files, stores the archive, and records the task as pending. The provider-specific template or image is built lazily on first execution, then reused for later runs.

Task API responses can include runtime-facing sandbox metadata:

  • sandbox_provider: which sandbox backend the task is prepared to run on
  • sandbox_build_id: the catalog build record associated with the task environment, once a build exists

The sandbox build record is the authoritative place to inspect build lifecycle state such as queued, building, ready, or failed. Tasks remain the source artifact; builds represent the runnable environment derived from that artifact. Newly imported OCI tasks may not have a build record yet.

For interactive solving, the API also exposes a task-instruction rendering endpoint. When a caller supplies a sandbox provider ID, the platform resolves service placeholders such as {{ web_url }} against that sandbox’s reachable URLs and returns the rendered instruction.

The tasks API supports fetching rendered instructions when you have a running sandbox. Use GET /org/{org}/tasks/{name}/instruction and pass the sandbox_id query parameter (the provider sandbox identifier) to resolve template variables like {{ service_url }} with live connection details. Without sandbox_id, the endpoint returns the raw instruction template.

Tasks do not own:

  • sandbox lifecycle
  • attempts or sessions
  • verification execution
  • benchmark orchestration
  • ZIP upload endpoints

Those concerns now live in the execution domains that reference the task definition.

For interactive work, the platform provisions a runtime. Runtimes are exposed through the workspace-scoped runtimes API, while the underlying runtime records remain visible in the sandboxes inventory.

For judged and repeatable runs, the platform creates an evaluation. Each evaluation item combines the task environment with a runtime sandbox, runs the agent inside that runtime, and then executes the task’s verification rules. If the task has never been built for the active sandbox provider, the first run triggers that build before the environment sandbox is provisioned.

Verification remains part of the task definition. The task defines what success means, but the execution domain performs the actual verification step.