Building a C Compiler with a Team of Parallel Claudes
Published: February 5, 2026
Overview
Anthropic researcher Nicholas Carlini tasked 16 Claude Opus 4.6 instances with autonomously building a Rust-based C compiler capable of compiling the Linux kernel. Over nearly 2,000 Claude Code sessions and approximately $20,000 in API costs, the agent team produced a 100,000-line compiler supporting x86, ARM, and RISC-V architectures.
Enabling Long-Running Claudes
Existing agent scaffolds require continuous human oversight. To enable autonomous operation, Carlini created a simple loop structure:
#!/bin/bash
while true; do
COMMIT=$(git rev-parse --short=6 HEAD)
LOGFILE="agent_logs/agent_${COMMIT}.log"
claude --dangerously-skip-permissions \
-p "$(cat AGENT_PROMPT.md)" \
--model claude-opus-X-Y &> "$LOGFILE"
doneThe agent prompt instructs Claude to break problems into small pieces, track progress, determine next steps, and continue until achieving completion. "The loop runs forever," though Carlini notes one instance accidentally terminated itself.
Running Claude in Parallel
Multiple instances address key weaknesses of single-agent systems:
- Sequential limitations: One Claude Code session handles only one task at a time; parallel agents accelerate debugging across expanding projects.
- Specialization: Dedicated agents can handle documentation, code quality, and specialized subtasks while others solve primary problems.
Synchronization Mechanism
The implementation uses a straightforward git-based approach:
- Agents claim tasks by creating text files in
current_tasks/directory - Git's synchronization prevents duplicate work; conflicting claims force agent selection of alternative tasks
- Agents pull upstream changes, merge modifications, push results, and release locks
- Fresh containers spawn new Claude sessions in continuous cycles
Design Lessons for Agent Teams
Write Extremely High-Quality Tests
"Claude will work autonomously to solve whatever problem I give it. So it's important that the task verifier is nearly perfect, otherwise Claude will solve the wrong problem." Carlini built continuous integration pipelines and stricter enforcement preventing new commits from breaking existing functionality.
Put Yourself in Claude's Shoes
Agents dropped into fresh containers without context require extensive orientation. Carlini maintained READMEs and progress files updated frequently with current status.
Key limitations requiring design workarounds:
Context window pollution: Test harnesses should output minimal bytes, logging important information to files. Error messages should include "ERROR" on the same line for grep discovery.
Time blindness: Claude cannot track elapsed time and will spend hours on tests instead of progressing. The harness includes a --fast option running deterministic 1-10% random samples per agent.
Make Parallelism Easy
Initially, when test suites reached 99% pass rates, agents worked on independent open-source projects (SQLite, Redis, libjpeg, MQuickJS, Lua). However, compiling the Linux kernel—a monolithic task—caused all agents to encounter identical bugs repeatedly.
The solution employed GCC as a reference compiler oracle. A new harness randomly compiled kernel sections using GCC, only testing remaining files with Claude's compiler. If the kernel worked, problems didn't exist in Claude's compiled subset. This enabled parallel debugging of different files simultaneously until complete compilation succeeded.
Multiple Agent Roles
Parallelism enabled specialization:
- One agent coalesced duplicate code
- Another improved compiler performance
- A third optimized compiled output efficiency
- An agent critiqued design from Rust perspective
- Another handled documentation
Stress Testing Results
Evaluation Metrics
Opus 4.6 consumed 2 billion input tokens and generated 140 million output tokens across two weeks—costing just under $20,000. The compiler:
- Builds bootable Linux 6.9 on x86, ARM, and RISC-V
- Compiles QEMU, FFmpeg, SQLite, PostgreSQL, and Redis
- Achieves 99% pass rate on most compiler test suites including "the GCC torture test suite"
- Passes the developer's "ultimate litmus test: it can compile and run Doom"
Limitations
The compiler remains incomplete:
- Lacks 16-bit x86 compiler for real-mode Linux boot; calls GCC instead
- Missing custom assembler and linker (still buggy)
- Doesn't universally replace production compilers
- Generated code less efficient than GCC with optimizations disabled
- Rust code quality reasonable but below expert standards
One particularly challenging failure: Opus couldn't implement a 16-bit x86 code generator. While correct code generation was possible via 66/67 opcode prefixes, output exceeded Linux's 32KB limit. Claude calls GCC for this phase on x86; ARM and RISC-V achieve complete self-compilation.
Looking Forward
Agent teams demonstrate autonomous complex project implementation. "This approach dramatically expands the scope of what's achievable with LLM agents," enabling more ambitious user goals.
However, autonomous development carries real risks. Without human oversight during development, quality assurance becomes challenging. Carlini—formerly in penetration testing—expresses concern: "the thought of programmers deploying software they've never personally verified is a real concern."
"Building this compiler has been some of the most fun I've had recently, but I did not expect this to be anywhere near possible so early in 2026." The rapid progress in language models and interaction scaffolds enables substantial code generation but requires new safety navigation strategies.
Acknowledgments
Nicholas Carlini thanks Josef Bacik, Edwin Chen, Bernardo Meurer Costa, Jake Eaton, Dan Kelley, Felix Klock, Jannet Park, Steve Weis, and numerous other Anthropic contributors.