Skip to content

Architecture

This section outlines the primary services, storage layers, and communication paths in a self-hosted Dreadnode stack.

Internet
┌───────────────────────┐
│ Caddy Proxy │
│ /api/* → API │
│ /* → Frontend │
└───────────┬───────────┘
┌────────┴────────┐
▼ ▼
┌──────────────────────────┐ ┌──────────────────────────┐
│ Frontend (SvelteKit) │ │ API Server (FastAPI) │
│ packages/frontend/ │ │ packages/api/ │
│ │ │ │
│ │ │ ┌──────────────────────┐ │
│ │ │ │ Training Worker │ │
│ │ │ │ Evaluation Worker │ │
│ │ │ └──────────────────────┘ │
└──────────────────────────┘ └─────────────┬─────────────┘
│ │ │ │
┌───────────┘ │ │ └──────────┐
▼ ▼ ▼ ▼
┌───────────┐ ┌────────┐ ┌──────┐ ┌───────────┐
│PostgreSQL │ │ Click- │ │ S3/ │ │ LiteLLM │
│ │ │ House │ │MinIO │ │ │
│Users, Orgs│ │ │ │ │ │ LLM │
│RBAC, Meta │ │Traces │ │Pkgs │ │ Proxy │
│Jobs │ │Metrics │ │Artif.│ │ │
└───────────┘ └────────┘ └──────┘ └───────────┘
┌──────────────────────────────────────────────────┐
│ Sandbox Proxy │
│ *.sandbox.<domain> → Docker / E2B sandboxes │
└──────────────────────────────────────────────────┘
ComponentPurposeTechnology
Caddy ProxyRoutes requests to API and FrontendCaddy 2
APIBackend service, business logic, worker hostFastAPI, SQLAlchemy, Pydantic
Training WorkerIn-process training job executor (config-gated)Runs inside API process
Evaluation WorkerIn-process evaluation job executor (config-gated)Runs inside API process
FrontendWeb UI for usersSvelteKit, TypeScript
LiteLLMLLM inference proxy for dn/ model aliasesLiteLLM
Sandbox ProxyPublic wildcard proxy for sandboxesCaddy + AWS ALB/ECS
Agent SandboxOn-demand compute for running AI agentsDocker or E2B
PostgreSQLState data (users, orgs, RBAC, jobs)Postgres 16
ClickHouseEvent data, OTEL traces, telemetryClickHouse 24.x
S3/MinIOObject storage for packages and artifactsAWS S3 or MinIO
  1. State Data (PostgreSQL): users, organizations, projects, RBAC metadata, training/evaluation job records.
  2. Event Data (ClickHouse): OTEL traces, run telemetry, metrics, high-volume logs.
  3. Object Storage (S3/MinIO): packages, artifacts, file uploads, training checkpoints.

The API server hosts two optional background workers that poll for and execute jobs. These are not separate services — they run as async loops inside the API process.

  • Training Worker: Enabled via TRAINING_IN_PROCESS_WORKER_ENABLED. Polls for training jobs and dispatches them to a backend (Tinker, Ray). Configurable concurrency and poll interval.
  • Evaluation Worker: Enabled via EVALUATION_IN_PROCESS_WORKER_ENABLED. Polls for evaluation jobs, provisions sandboxes, and runs evaluation items. Configurable concurrency and poll interval.

See Configuration for all worker environment variables.

The API follows a Domain-Driven Design layout. Each domain is isolated under app/[domain]/:

app/
├── api/v1/ # Router aggregation only
├── core/ # Foundational infrastructure (no external deps)
├── infra/ # External integrations (DB, S3, ClickHouse)
└── [domain]/ # Business domains (auth, users, projects, etc.)
├── models.py # SQLAlchemy models
├── schemas.py # Pydantic schemas
├── service.py # Business logic
├── repository.py # Data access
└── router_v1.py # HTTP routes
PackagePurposeTechnology
packages/apiBackend API serverFastAPI, SQLAlchemy, Pydantic
packages/sdkPython client SDKhttpx, Pydantic
packages/frontendWeb applicationSvelteKit, TypeScript, Tailwind
platform/pulumiAWS infrastructurePulumi (Python)
  • The Caddy proxy is the entry point: /api/* routes to the API server, everything else to the Frontend.
  • The Frontend communicates with the API via HTTP/REST through the Caddy proxy.
  • The API reads and writes state data in Postgres, event data in ClickHouse, and objects in S3/MinIO.
  • The API calls LiteLLM for dn/-prefixed model inference. LiteLLM proxies to upstream LLM providers.
  • The Sandbox Proxy routes *.sandbox.<domain> wildcard subdomains to sandbox backends (Docker or E2B).
  • Agent Sandboxes are provisioned on-demand by the API for runtimes, evaluations, and training jobs.