Wednesday night: an agent rewrites the rate limiter. Thursday morning: a different agent writes tests for the old rate limiter. Two hours later: a third agent fixes a bug in logic that no longer exists.
That’s what a stateless swarm looks like. Each agent cold-boots with only the current code to read. No session knows what any previous session decided.
what gets lost every session
When a stateless agent starts, it reads the codebase. It doesn’t know:
- That you decided last week to defer auth until after the demo
- That the pattern in
services/is intentional, not tech debt - That someone already tried extracting that utility and reverted it for a reason
- That the test suite is slow because of a known fixture issue, not a problem to fix
It will infer what it can. It will miss what requires history. It will confidently fix things that don’t need fixing and leave the things that do.
the coordination problem gets worse with more agents
Add a second agent and you have two fresh starts, not one twice as capable.
Without shared memory, they’ll duplicate work, contradict decisions, and step on each other. Not because they’re bad tools. There’s no shared record of what’s been tried, settled, or ruled out.
Memory is what turns a fleet of agents into a swarm. One agent’s decision to defer a refactor becomes context for the next. A finding about a brittle integration pattern propagates to all subsequent sessions. The codebase knowledge compounds instead of resetting.
what compounding looks like
Six months of commits. Tens of thousands of agent sessions. The failure modes shifted.
Early on: agents duplicated work constantly. They’d extract the same utility three different ways in a week because no session knew what the others had tried. The style was inconsistent: each agent had its own read on the codebase conventions.
By month two: agents had enough commit history to recognize what was settled. They stopped proposing changes to stable files. They’d seen the failed extractions in the git log and implicitly knew to avoid that direction.
By month four: the mistakes were more interesting. Correct logic that violated an unstated convention. A fix that improved one module and silently degraded the caller. Not stateless errors. Informed errors. The agent understood enough context to fail in more sophisticated ways.
The mistakes getting more interesting is the closest thing to improvement you can measure without unfreezing the weights.
the memory a swarm needs
Not everything. The wrong memory is noise: chat logs, status updates, every decision ever made. That’s context pollution.
The right memory is durable signal: architectural decisions and their rationale, recurring failure patterns, what the founder wants this week (and what they’ve deprioritized), what the codebase conventions are and why.
One file per agent with the things worth knowing cold. A shared knowledge base with the patterns that outlast any single session. The coordination ledger that records what’s been decided so the next agent doesn’t relitigate it.
That’s the structure. It doesn’t require exotic tooling. Just the discipline to write things down and read them.
what you get from the memory side
You stop explaining the same things twice. Agents that woke up last week already know your preferred error handling style. They know which dependency you’re planning to replace. They know the test coverage is intentional, not an oversight.
You stop seeing the same class of mistake. Not because agents got smarter. They have context about what’s been tried and why it didn’t work.
The work compounds. Today’s commits make next week’s agents smarter. Not in theory. In the git log.
That’s what a teammate does. A tool starts over every time.