NativeLink on-prem

Self-host NativeLink on hardware you control, with the operational checklist a team rollout actually needs.

The source-available release of NativeLink is designed to run on your own infrastructure. This page covers what changes between the 10-minute setup and a deployment that serves your team without 3 AM pages.

When to self-host

Pick on-prem when one of these is true:

Data residency. Build artifacts can carry source code, toolchains, even credentials. Self-hosting keeps them in whatever region or boundary you need (EU GDPR, US FedRAMP boundaries, air-gapped corp networks).
Specialised hardware. GPU workers, ARM cross-compile fleets, in-house silicon. On-prem lets you use anything that can run a Linux binary.
You already operate stateful services. If your team owns Kubernetes, Postgres, S3-compatible storage, adding NativeLink is marginal work.

If you'd rather not run it yourself, Enterprise offers a managed deployment.

What ships in the box

A NativeLink deployment is composed of four roles. The same binary serves all of them; the JSON5 config decides which subset to run.

Role	What it does	Statefulness
CAS server	Stores and serves content-addressed blobs.	Stateful
AC server	Maps `Action` digests to `ActionResult`s.	Stateful
Scheduler	Receives `Execute` calls, dispatches to workers.	Stateless
Worker	Runs the action in a sandbox, uploads outputs to CAS.	Stateless

A single binary can run all four. For anything beyond a single developer, run them on separate processes so you can scale and restart each independently.

The rollout checklist

Pick the storage backend. The default in-memory CAS is fine for a 10-minute demo and disastrous in production. Pick one before rolling out:
- Filesystem — single-node clusters, sub-100GB caches.
- S3 (or compatible) — anything multi-node. R2, MinIO, GCS, Azure Blob all work via the S3 adapter.
- Redis — hot-key acceleration in front of the durable store. Optional but cheap latency wins.
Pick the deployment substrate. Most teams land on one of:
- Kubernetes — see Deployment → Kubernetes for a working chart.
- Bare VMs — systemd units, one binary per role, with a load balancer in front.
- Docker Compose — a reasonable starting point for ≤ 5 developers.
Plan capacity. Heuristics from production clusters:
- CAS storage: 5–20 GB per active developer, depending on language. C++ skews high; Go skews low.
- Worker CPU: 1 vCPU per concurrent action. Headroom matters more than peak.
- Network: cache reads are the hot path. Provision at least 1 Gbps between workers and CAS.
Set up TLS. mTLS between every hop. The Configuration → Production guide has the certificate layout.
Wire up metrics. NativeLink emits Prometheus metrics out of the box. Point Grafana at it. See Deployment → Metrics.

Container registry

Official images are published to GitHub Container Registry. Pull the specific version your config references — latest works but pinning avoids surprises.

docker pull ghcr.io/tracemachina/nativelink:v1.6.1

The /pkgs/container/nativelink page lists every published tag.

Backups & recovery

The CAS is the only stateful piece you can't trivially rebuild from clients. Snapshot strategy depends on the backend:

Filesystem — rsync or your filesystem's snapshot facility (ZFS, Btrfs). Restore by stopping the CAS, swapping the directory, starting again.
S3-compatible — versioning + lifecycle policies handle the primary copy. For DR, cross-region replication.
Redis — treat as ephemeral. Loss is a cache miss, not data loss.

The Action Cache can be wiped without data loss; you'll re-execute everything until it warms back up.

FAQ

What's next

Configuration → Production — the JSON5 shape for a real cluster.
Deployment → Kubernetes — a working Helm chart.
Deployment → Metrics — Prometheus and Grafana, ready to go.

When to self-host

What ships in the box

The rollout checklist

Container registry

Backups & recovery

FAQ

Is self-hosting free?

What am I actually paying for when I self-host?

Does the free version limit cache or team size?

What's next

On this page