Home › Firecracker MicroVM Isolation
Platform Engineering · Capability
Firecracker MicroVM Isolation.
Per-tenant isolation at VM speed. For teams running untrusted code — CI runners, user-supplied notebooks, sandboxed inference — Firecracker is the right primitive and we build the control plane around it.
Scope
What we do
- Design and operate Firecracker-based isolation layers (jailer, rootfs, network).
- Build orchestrators on top of Firecracker (Ignite-style ephemeral runners).
- Integrate with Kubernetes via Kata-style runtime or bespoke shim.
- Audit and measure blast-radius across microVMs.
Practical
Exercises we run
Small, repeatable drills we use on engagements and teach in workshops. Each has a lab setup, step-by-step outline, and measurable output.
Shipped · v0.1
150 microVMs, coordinated on NATS
150 Firecracker microVMs — each spawning in ~60ms, making a single call to Azure OpenAI, and disappearing. Every slot visible in a live browser grid. The entire system — spawners, LLM streams, UI, acks — runs over one JetStream topic.
The worst bug had nothing to do with VMs
Every guest failed its HTTPS call at exactly 5040ms — 100% failure, cluster-wide. Same node, same API key, curl from the pod: fine. The culprit was DNS. Under concurrent load, UDP replies were getting lost on Azure's outbound SNAT path. Fix: pre-resolve the endpoint at rootfs build time and bake the IP into /etc/hosts. The guest never sends a DNS packet. Zero failures since.
NATS carried everything — including the live UI
At full pool: 150 concurrent LLM calls, each streaming SSE chunks back through JetStream in real time; ack state tracked across every in-flight message; telemetry heartbeats from every node every 2 seconds. No REST, no polling, no intermediary. The browser was a first-class NATS client on the same subject space — new node joins the pool by pulling its first message. The only bottleneck was rendering 150 slots on a canvas at once; NATS itself never blinked.
The numbers
3 × Azure AKS D8s_v6 nodes (32 GB RAM), 50 microVM slots each = 150 total. All 150 materialise within 60ms. On gpt-4.1-mini: single-word answers end-to-end in 400–500ms (covering warm image restore, network init, Azure auth, inference). Three-sentence responses: first token at 400–500ms, full answer in 1.1–1.6s. The workload is I/O-bound — 95% of each VM's life is waiting on Azure SSE.
Threat model
What a compromised microVM can and cannot reach
Firecracker's isolation story is stronger than containers and weaker than full VMs. Here's what that actually means for a tenant that has achieved full code execution inside its guest.
Reachable by a compromised microVM
- Its own guest kernel and memory (by definition — it is root in its own box).
- Its ext4 rootfs image as mounted by the jailer, read-write unless explicitly marked read-only.
- One virtio-net TAP interface (egress policed by iptables / nftables on the host).
- The single virtio-vsock or API socket the jailer exposed to it — no other guests' sockets.
- The host CPU's virtualization extensions and MSRs that Firecracker exposes (KVM-mediated, filtered).
- Side-channel signals (cache timing, branch predictor) — mitigations depend on host kernel + microcode.
NOT reachable (properly configured)
- The host kernel's filesystem outside the jailer's chroot — jailer pivots root, drops caps, and seccomps the VMM.
- Other microVMs' memory, rootfs, TAPs, or vsocks — each Firecracker process is a separate chroot + seccomp + cgroup.
- The control-plane API socket of another microVM (jailer owns per-VM sockets with uid-scoped permissions).
- Arbitrary host device nodes — Firecracker exposes a minimal virtio device surface; no passthrough by default.
- IaaS metadata endpoints, unless you explicitly routed the TAP network to them (we don't).
- Persistent host state between reboots — rootfs is ephemeral; stateful data lives in an attached, per-tenant volume.
All claims above assume the jailer is running with its default hardening (pivot_root, drop_caps, seccomp filter, per-VM cgroup) and the host kernel is patched. A misconfigured Firecracker that skips the jailer, disables seccomp, or shares a TAP across tenants is out of scope — that's a container with extra steps.
Further reading
More on Firecracker.
Workshops we teach + field notes we're writing, all linked back to what you just read. See all workshops → See all field notes →
Hands-on: Firecracker runners — 1-day workshop
Per-tenant microVM isolation from the 200-LOC primitive through the Ignite-managed CI runner pool. Prove the blast-radius story.
Scheduling soon →
Per-tenant microVM sandbox in 200 LOC
Jailer + rootfs + single-TAP isolation, hand-driven, with a blast-radius audit that mirrors the /firecracker/ threat model.
Draft →
Ignite + Firecracker for CI runner isolation
Ephemeral microVM runner pool for GitHub Actions, with cost break-even vs hosted + concurrent-job leak test.
Draft →
Engagement
Hands-on: Firecracker runners — 1-day workshop
Packaged engagement — we scope, build, and hand over with runbooks, against a specific SLA. Add to cart to request delivery; no price is billed up-front.
Neux Ltd
AI Infrastructure · Platform Engineering · London.
Since 2014.
Contact
Legal
© 2014–2026 Neux Ltd
Registered in England & Wales.
