// architecture overview //
how nix2gpu works under the hood.
// the big picture //
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ nix flake │────│ Nimi │────│ OCI image │
│ definition │ │ mkContainerImage│ │ (layered) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ container config│ │ startup script │ │ docker/podman │
│ (nix modules) │ │ + Nimi runtime │ │ runtime │
└─────────────────┘ └─────────────────┘ └─────────────────┘
nix2gpu transforms declarative nix configurations into reproducible GPU containers through a multi-stage build process.
// build pipeline //
1. nix evaluation
perSystem.nix2gpu."my-container" = {
cuda.packages = pkgs.cudaPackages_12_8;
tailscale.enable = true;
services."api" = {
process.argv = [ (lib.getExe pkgs.my-api) "--port" "8080" ];
};
};
The nix module system processes your configuration, applying defaults, validating options, and computing the final container specification.
2. dependency resolution
nix build .#my-container
Nix builds the entire dependency graph:
- Base system packages (bash, coreutils, etc.)
- CUDA toolkit and drivers
- Your application packages
- Service configurations
- Startup scripts
3. image assembly
nimi.mkContainerImage {
name = "my-container";
copyToRoot = [ baseSystem cudaPackages userPackages ];
}
Nimi builds the OCI image (via nix2container) with:
- Layered filesystem for efficient caching
- Only necessary dependencies included
- Reproducible layer ordering
4. container execution
docker run --gpus all my-container:latest
The startup script orchestrates initialization, then Nimi runs services.
// filesystem layout //
/
├── nix/
│ └── store/ # immutable package store
│ ├── cuda-* # CUDA toolkit
│ ├── startup-* # initialization script
│ └── packages-* # your applications
├── etc/
│ ├── ssh/ # SSH daemon config
│ └── ld.so.conf.d/ # library search paths
├── run/
│ └── secrets/ # mounted secret files
├── workspace/ # default working directory
└── tmp/ # temporary files
Key principles:
- Immutable system:
/nix/storecontains all software, never modified at runtime - Mutable state:
/workspace,/tmp,/runfor runtime data - Secrets: mounted at
/run/secretsfrom external sources - Library paths: dynamic loader configured for both nix store and host-mounted drivers
// startup sequence //
The container initialization follows a precise sequence:
1. environment setup
# startup.sh
export PATH="/nix/store/.../bin:$PATH"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/lib/x86_64-linux-gnu"
export CUDA_PATH="/nix/store/...-cuda-toolkit"
- Sets up PATH for nix store binaries
- Configures library search for both nix store and host-mounted NVIDIA drivers
- Establishes CUDA environment
2. runtime detection
if [[ -d "/lib/x86_64-linux-gnu" ]]; then
echo "vast.ai runtime detected"
# patch nvidia utilities for host drivers
elif [[ -n "$RUNPOD_POD_ID" ]]; then
echo "runpod runtime detected"
# configure network volumes
else
echo "bare-metal/docker runtime detected"
fi
Adapts configuration based on detected cloud provider or bare-metal environment.
3. GPU initialization
# link host drivers to expected locations
ldconfig
# test GPU access
nvidia-smi || echo "GPU not available"
Ensures GPU toolchain works with both nix store CUDA and host-mounted drivers.
4. network setup
# tailscale daemon (if enabled)
if [[ -n "$TAILSCALE_AUTHKEY_FILE" ]]; then
tailscaled --state-dir=/tmp/tailscale &
tailscale up --authkey="$(cat $TAILSCALE_AUTHKEY_FILE)"
fi
# SSH daemon
mkdir -p /var/empty /var/log
sshd
Starts networking services: Tailscale for mesh networking, SSH for remote access.
5. service orchestration
# Nimi is the container entrypoint
nimi --config /nix/store/.../nimi.json
Nimi runs the startup hook and then launches your modular services.
// service management //
Nimi
nix2gpu uses Nimi, a tiny process manager for NixOS modular services (Nix 25.11):
services."api" = {
process.argv = [ (lib.getExe pkgs.my-api) "--port" "8080" ];
};
nimiSettings.restart.mode = "up-to-count";
Benefits over systemd:
- No init system complexity
- Modular service definitions
- JSON config generated by Nix
- Predictable restart behavior
service lifecycle
- Dependency resolution: services start in correct order
- Health monitoring: automatic restart on failure
- Log aggregation: all service logs to stdout for
docker logs - Graceful shutdown: proper signal handling for container stops
// networking architecture //
standard mode (docker/podman)
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ host │────│ container │────│ service │
│ localhost:* │ │ bridge │ │ localhost:* │
└─────────────┘ └─────────────┘ └─────────────┘
Standard container networking with port forwarding.
tailscale mode (mesh networking)
┌─────────────┐ ┌─────────────────┐ ┌─────────────┐
│ host-a │ │ tailscale │ │ host-b │
│ container-a ├────┤ mesh network ├────┤ container-b │
│10.0.0.100:22│ │ │ │10.0.0.101:22│
└─────────────┘ └─────────────────┘ └─────────────┘
Direct container-to-container communication across hosts via Tailscale.
Key advantages:
- No port forwarding needed
- Works across clouds and networks
- End-to-end encryption
- DNS-based service discovery
- ACL-based access control
// GPU integration //
driver compatibility
# nix store CUDA toolkit
/nix/store/...-cuda-toolkit/
├── bin/nvcc
├── lib/libcuda.so # stub library
└── include/cuda.h
# host-mounted real drivers
/lib/x86_64-linux-gnu/
├── libcuda.so.1 # actual GPU driver
├── libnvidia-ml.so.1
└── libnvidia-encode.so.1
The challenge: CUDA applications need both:
- CUDA toolkit (development headers, nvcc compiler) from nix store
- Actual GPU drivers from the host system
The solution: dynamic library path configuration
export LD_LIBRARY_PATH="/nix/store/...-cuda/lib:${LD_LIBRARY_PATH}:/lib/x86_64-linux-gnu"
ldconfig
This allows nix store CUDA to find host-mounted drivers at runtime.
cloud provider adaptations
vast.ai: NVIDIA drivers mounted at /lib/x86_64-linux-gnu
# startup.sh detects vast.ai and configures paths
patchelf --set-rpath /lib/x86_64-linux-gnu /nix/store/.../nvidia-smi
runpod: Standard nvidia-docker integration
# uses nvidia-container-toolkit mounts
# drivers available via standard paths
bare-metal: Host nvidia-docker setup
# relies on proper nvidia-container-toolkit configuration
# GPU access via device mounts: --gpus all
// secret management //
security principles
- Secrets never enter nix store (nix store is world-readable)
- Runtime-only access (secrets mounted at container start)
- File-based injection (not environment variables)
- Minimal exposure (secrets only accessible to specific processes)
agenix integration
nix2gpu."my-container" = {
age.enable = true;
age.secrets.tailscale-key = {
file = ./secrets/ts-key.age;
path = "/run/secrets/ts-key";
};
tailscale.authKeyFile = config.secrets.tailscale-auth.path;
};
Flow:
- Host system decrypts secrets to
/run/secrets/ - Container mounts
/run/secretsas volume - Container references secrets by path, never by value
// caching & performance //
layer optimization
# nix2container creates efficient layers
[
layer-01-base-system # coreutils, bash, etc.
layer-02-cuda-toolkit # large but stable
layer-03-python-packages # frequently changing
layer-04-app-code # most frequently changing
]
Frequently changing components go in higher layers to maximize cache hits.
build caching
# first build: downloads everything
nix build .#my-container # ~15 minutes
# subsequent builds: only changed layers
nix build .#my-container # ~30 seconds
Nix’s content-addressed store ensures perfect reproducibility with efficient incremental builds.
registry layer sharing
# multiple containers share base layers
my-container:v1 # 2GB total (8 layers)
my-container:v2 # +100MB (only top layer changed)
other-container:v1 # +500MB (shares 6 bottom layers)
OCI registries deduplicate shared layers across images.
// extending the system //
custom services
# services/my-service.nix
{ lib, pkgs, ... }:
{ config, ... }:
let
inherit (lib) mkOption types;
cfg = config.myService;
in
{
_class = "service";
options.myService = {
port = mkOption { type = types.port; default = 8080; };
# ... other options
};
config.process.argv = [
(lib.getExe pkgs.my-service)
"--port"
(toString cfg.port)
];
}
custom cloud targets
# modules/container/scripts/copy-to-my-cloud.nix
{
perSystem = { pkgs, self', ... }: {
perContainer = { container, ... }: {
scripts.copy-to-my-cloud = pkgs.writeShellApplication {
name = "copy-to-my-cloud";
text = ''
# implement your cloud's container registry push
'';
};
};
};
}
The modular architecture makes it straightforward to add new cloud providers or service types.