What Claude Code's Source Code Actually Says About Subagents

A post making the rounds claims that Claude Code subagents share a prompt cache, making parallelism "basically free." It says you can spin up five agents and pay barely more than one. It lists three execution models — fork, teammate, and worktree — and says they all share the cache. Analysis of the source code reveals that some of this is true, but most of it isn't — or at least, isn't true in the way the post suggests.

h/t to @swyx for sharing this post as part of his excellent coverage of the Claude Code source leak in yesterday's AI News.

Fork mode exists, but you probably don't have it

The cache-sharing behavior the post describes is real. It lives in a subagent execution model called "fork," where a child agent inherits the parent's full conversation context and system prompt. The code goes to considerable lengths to keep the API request prefix identical across fork children so they hit the Anthropic API's prompt cache. The parent's rendered system prompt is threaded directly to forks rather than recomputed, specifically to prevent cache busting from feature flag state changes between turns.¹ Fork children receive the parent's exact tool array rather than building their own.²

The problem is that fork mode is gated behind feature('FORK_SUBAGENT'), a compile-time flag resolved by the Bun bundler.³ When the flag is off, the code is eliminated from the build entirely. There is no user-facing setting, environment variable, or runtime toggle to enable it. If your build doesn't have it compiled in, it doesn't exist.

When fork is disabled, omitting subagent_type on the Agent tool falls back to a general-purpose agent with no parent context and no cache sharing.⁴ This is the default subagent experience in Claude Code. (In case you're wondering, the /fork slash command doesn't register; its alias gets claimed by /branch instead.⁵)

The cost math doesn't add up

Even when fork mode is active, "5 agents cost barely more than 1" overstates things.

The cost model tracks input tokens, output tokens, cache read tokens, and cache write tokens separately.⁶ Cache reads cost about 10% of regular input, but the first fork pays a 25% write premium to populate the cache, and every fork pays full price for its own output tokens, its unique task directive, and all the tool calls and results it generates while doing its work.

The savings are proportional to how large the shared prefix is relative to total token usage. For agents doing real work — reading files, running commands, writing code — output and tool interaction tokens add up quickly. The shared prefix helps, but it doesn't make parallelism free.

This may not matter to Claude Max subscribers, but it certainly matters if you have "extra usage" enabled.

There aren't three execution models. There are four, and worktree isn't one of them.

The post lists fork, teammate, and worktree as three execution models. The routing logic in the source tells a different story.

Teammate is an independent worker spawned with a team_name and name. Teammates do not inherit the parent's conversation. The parent's messages are explicitly zeroed out at spawn time.⁷ They build their own history from scratch.

Fresh specialized is triggered by setting subagent_type. The agent gets its own system prompt, its own tool pool, and no parent context.

Fork is triggered by omitting subagent_type when the fork gate is enabled. It inherits parent context and is cache-optimized, as described above.

General-purpose is the fallback when subagent_type is omitted and fork is disabled. It behaves like a fresh specialized agent using a default agent definition.

Worktree is not an execution model. It's an isolation modifier that can be combined with any of the four models above.¹⁰ It creates an isolated git worktree so the agent's file operations don't touch the parent's working copy. A fork agent with worktree isolation still inherits context. A fresh specialized agent with worktree isolation still doesn't. Worktree changes where file operations land, not how context is constructed.

The post's central claim is that all subagent types share the prompt cache. This is wrong.

Fresh specialized agents build a different system prompt, assemble a different tool pool, and carry no parent conversation history. Different prefix, no cache sharing. Teammates have their messages explicitly emptied.⁷ They share nothing with the parent.

Fork children can't fork further, either. The code detects a boilerplate tag in conversation history and rejects recursive fork attempts.⁸ Cache sharing is one level deep.

Teammates don't all use file-based mailboxes

The post says teammates communicate via file-based mailbox. This depends on which backend spawns them.

In-process teammates use an in-memory Mailbox class with a queue-and-waiters async pattern.⁹ No files involved. Teammates spawned in separate tmux or iTerm panes do use file-based mechanisms for initial instruction delivery, but that's a different execution path.

What the source code actually tells us

The interesting story here isn't the one the post tells. The source reveals a team thinking carefully about API-level cache mechanics, building explicit cost tracking around them, and keeping the feature gated while they validate it. The CacheSafeParams type, the byte-exact system prompt threading, the identical placeholder tool results, the analytics tracking cache hit rates: this is deliberate, measured engineering work.

But it's behind an experiment gate. Most users interact with fresh specialized or general-purpose agents that share nothing with each other or the parent. That's worth knowing before you reorganize your workflow around a capability you may not have.

Feature Image Prompt:

Generate an image. The aesthetic should be cyberpunk with colors of neon pink, blue and purple. Do not add any people. A glowing central node pulses at the center of a dark circuit-board cityscape, splitting into three distinct branching pathways made of light. The first branch is a perfect mirror-copy of the central node, trailing an identical stream of data behind it — a fork. The second branch launches a fresh, smaller node with its own clean trajectory and no trailing data — a specialized agent. The third branch creates an independent floating terminal window connected back to the hub only by a thin mailbox-style data link — a teammate. Each pathway illuminates a different sector of the city below. A translucent hexagonal grid overlays everything, representing the shared prompt cache, but only the fork branch glows where it intersects the grid. Tiny flowing particles of data stream along the branches. In the background, isolated floating platforms with their own miniature cityscapes represent worktree isolation. The overall composition suggests parallel orchestration — multiple autonomous systems radiating from a single point of origin.

References

All references are to the Claude Code source as leaked March 31, 2026. Someone who isn't me obtained it and performed this analysis.

forkSubagent.ts:54-58 — Comment: "Reconstructing by re-calling getSystemPrompt() can diverge (GrowthBook cold→warm) and bust the prompt cache; threading the rendered bytes is byte-exact." ↩
AgentTool.tsx:627 — availableTools: isForkPath ? toolUseContext.options.tools : workerTools ↩
forkSubagent.ts:32-39 — isForkSubagentEnabled() checks feature('FORK_SUBAGENT'), then excludes coordinator mode and non-interactive sessions. ↩
AgentTool.tsx:322 — const effectiveType = subagent_type ?? (isForkSubagentEnabled() ? undefined : GENERAL_PURPOSE_AGENT.agentType) ↩
commands/branch/index.ts:8 — aliases: feature('FORK_SUBAGENT') ? [] : ['fork'] ↩
modelCost.ts:131-138 — tokensToUSDCost() sums input_tokens, output_tokens, cache_read_input_tokens, and cache_creation_input_tokens at different rates. ↩
spawnMultiAgent.ts:927-931 — Comment: "Strip messages: the teammate never reads toolUseContext.messages." Code: toolUseContext: { ...context, messages: [] } ↩↩
forkSubagent.ts:73-87 — isInForkChild() scans conversation history for the fork boilerplate tag and rejects recursive fork attempts. ↩
mailbox.ts:19-73 — In-memory Mailbox class with send(), poll(), and receive() methods using a queue-and-waiters pattern. ↩
AgentTool.tsx:431 — const effectiveIsolation = isolation ?? selectedAgent.isolation — resolved independently of execution model routing. ↩

Learning Machine

Subagent Modes in Claude Code

What Claude Code's Source Code Actually Says About Subagents

Fork mode exists, but you probably don't have it

The cost math doesn't add up

There aren't three execution models. There are four, and worktree isn't one of them.

Teammates don't all use file-based mailboxes

What the source code actually tells us

References

What Claude Code's Source Code Actually Says About Subagents

Fork mode exists, but you probably don't have it

The cost math doesn't add up

There aren't three execution models. There are four, and worktree isn't one of them.

Not all subagents share the cache

Teammates don't all use file-based mailboxes

What the source code actually tells us

References

Subscribe to Learning Machine

Get Learning Machine in your inbox!