Home › Firecracker MicroVM Isolation

Platform Engineering · Capability

Firecracker MicroVM Isolation.

Per-tenant isolation at VM speed. For teams running untrusted code — CI runners, user-supplied notebooks, sandboxed inference — Firecracker is the right primitive and we build the control plane around it.

Scope

What we do

Design and operate Firecracker-based isolation layers (jailer, rootfs, network).
Build orchestrators on top of Firecracker (Ignite-style ephemeral runners).
Integrate with Kubernetes via Kata-style runtime or bespoke shim.
Audit and measure blast-radius across microVMs.

Practical

Exercises we run

Small, repeatable drills we use on engagements and teach in workshops. Each has a lab setup, step-by-step outline, and measurable output.

Per-tenant microVM sandbox in 200 LOCMinimal jailer + rootfs + API shim; boot a microVM per incoming request and tear it down on completion.

Ignite + Firecracker for CI runner isolationGitHub-Actions-style ephemeral runner pool with Firecracker-backed executors.

microVM blast-radius auditMeasure what a compromised microVM can reach; harden the jailer accordingly.

Shipped · v0.1

150 microVMs, coordinated on NATS

150 Firecracker microVMs — each spawning in ~60ms, making a single call to Azure OpenAI, and disappearing. Every slot visible in a live browser grid. The entire system — spawners, LLM streams, UI, acks — runs over one JetStream topic.

Firecracker v0.1 cluster — 150-slot live grid with NATS CLI output and streaming LLM responses

The worst bug had nothing to do with VMs

Every guest failed its HTTPS call at exactly 5040ms — 100% failure, cluster-wide. Same node, same API key, curl from the pod: fine. The culprit was DNS. Under concurrent load, UDP replies were getting lost on Azure's outbound SNAT path. Fix: pre-resolve the endpoint at rootfs build time and bake the IP into /etc/hosts. The guest never sends a DNS packet. Zero failures since.

NATS carried everything — including the live UI

At full pool: 150 concurrent LLM calls, each streaming SSE chunks back through JetStream in real time; ack state tracked across every in-flight message; telemetry heartbeats from every node every 2 seconds. No REST, no polling, no intermediary. The browser was a first-class NATS client on the same subject space — new node joins the pool by pulling its first message. The only bottleneck was rendering 150 slots on a canvas at once; NATS itself never blinked.

The numbers

3 × Azure AKS D8s_v6 nodes (32 GB RAM), 50 microVM slots each = 150 total. All 150 materialise within 60ms. On gpt-4.1-mini: single-word answers end-to-end in 400–500ms (covering warm image restore, network init, Azure auth, inference). Three-sentence responses: first token at 400–500ms, full answer in 1.1–1.6s. The workload is I/O-bound — 95% of each VM's life is waiting on Azure SSE.

NATS JetStreamFirecracker + snapshot restoreKubernetes (AKS)Envoy GatewayAzure OpenAI

Watch the walkthrough (8 min) →

Threat model

What a compromised microVM can and cannot reach

Firecracker's isolation story is stronger than containers and weaker than full VMs. Here's what that actually means for a tenant that has achieved full code execution inside its guest.

Reachable by a compromised microVM

Its own guest kernel and memory (by definition — it is root in its own box).
Its ext4 rootfs image as mounted by the jailer, read-write unless explicitly marked read-only.
One virtio-net TAP interface (egress policed by iptables / nftables on the host).
The single virtio-vsock or API socket the jailer exposed to it — no other guests' sockets.
The host CPU's virtualization extensions and MSRs that Firecracker exposes (KVM-mediated, filtered).
Side-channel signals (cache timing, branch predictor) — mitigations depend on host kernel + microcode.

NOT reachable (properly configured)

The host kernel's filesystem outside the jailer's chroot — jailer pivots root, drops caps, and seccomps the VMM.
Other microVMs' memory, rootfs, TAPs, or vsocks — each Firecracker process is a separate chroot + seccomp + cgroup.
The control-plane API socket of another microVM (jailer owns per-VM sockets with uid-scoped permissions).
Arbitrary host device nodes — Firecracker exposes a minimal virtio device surface; no passthrough by default.
IaaS metadata endpoints, unless you explicitly routed the TAP network to them (we don't).
Persistent host state between reboots — rootfs is ephemeral; stateful data lives in an attached, per-tenant volume.

All claims above assume the jailer is running with its default hardening (pivot_root, drop_caps, seccomp filter, per-VM cgroup) and the host kernel is patched. A misconfigured Firecracker that skips the jailer, disables seccomp, or shares a TAP across tenants is out of scope — that's a container with extra steps.

Hands-on: Firecracker runners — 1-day workshop

Packaged engagement — we scope, build, and hand over with runbooks, against a specific SLA. Add to cart to request delivery; no price is billed up-front.

Add to engagement →

Neux Ltd

AI Infrastructure · Platform Engineering · London.
Since 2014.

Contact

Also from Neux

neux.ai — AI consultancy

styk.tv — podcast

Legal

Firecracker MicroVM Isolation

Firecracker MicroVM Isolation.

What we do

Exercises we run

150 microVMs, coordinated on NATS

The worst bug had nothing to do with VMs

NATS carried everything — including the live UI

The numbers

What a compromised microVM can and cannot reach

More on Firecracker.

Hands-on: Firecracker runners — 1-day workshop

Per-tenant microVM sandbox in 200 LOC

Ignite + Firecracker for CI runner isolation

Hands-on: Firecracker runners — 1-day workshop