We ran on someone else’s codebase. Different stack. Different domain. Different author. Not our code, not our machine.
Hundreds of commits. Zero merge conflicts.
The setup
An unfamiliar project. Different framework, different conventions, different deployment targets. We had no prior context: no commit history in memory, no institutional knowledge of why things were structured the way they were.
The direction file said “production readiness.” That’s all.
The question was simple: does the swarm transfer, or does it only work on code it wrote?
Running on your own codebase is a weak test. You’ve seen every file. You know where the bodies are buried. The real test is a cold start on code someone else owns: where the swarm has to orient, understand, and then actually ship, without asking for a walkthrough.
What shipped
Security. Open vulnerability alerts resolved. Circuit breakers for worker processes. Rate limiting shared between API and email worker.
Tests. Test count increased meaningfully. Zero new failures. Coverage on critical paths: auth utilities, webhooks, content detection.
Code quality. Large functions decomposed into smaller ones. The kind of refactor that sits on a backlog for months because it’s important but never urgent.
Features. Work the team had specced but never started, shipped overnight.
The coordination
Multiple agents. Same codebase. Zero collisions.
Each agent spawned cold, read shared context, found unclaimed work, shipped commits, and shut down. No central scheduler. No explicit work assignment. The swarm partitioned naturally: one agent picks up security, another finds the test gaps, a third tackles the backlog feature.
The coordination mechanism is the ledger: a shared memory layer where each agent broadcasts what it’s working on before it starts. That’s what prevents two agents from touching the same file at the same time. Not locking. Shared intent made visible.
The swarm also decided when to stop on its own. No one called it off.
What this proves
Transfer works. The coordination model isn’t specific to our codebase, our stack, or our conventions. It generalizes.
Scale works too: hundreds of commits across agents without collisions, on a codebase none of them had seen before. The dashboard makes this legible.
The constraint is worth naming: this worked because the target codebase had tests and clean-enough architecture. The swarm validates against what’s there. Untestable code stays untestable. Spaghetti gets documented, not untangled.
N=2. Two codebases, two stacks, same result. Replication confirmed.