Retrieval Subagent

Split context retrieval into a focused agent that returns exact evidence. The main agent should receive selected files, ranges, and facts after the search noise has been discarded.

The Problem This Solves

Coding agents spend too much of their early work searching: they grep, read, follow imports, inspect tests, open neighboring files, and slowly fill the main context with everything they touched along the way.

That search trail is useful for finding evidence, but it is a bad working context. Tool output accumulates before the agent knows what matters, and irrelevant files stay in the window even after the useful line ranges have been found. ContextBench measured this directly: agents explore far more context than they actually use, and better scaffolding only marginally improves retrieval quality. That is wasted context.

The main agent should receive the evidence that survived the search, with the failed paths left behind in the retrieval context where they belong.

How It Works

Create a separate retrieval subagent whose only job is to find task-relevant context. Give it the query, repository map or search tools, and a strict output contract: files, line ranges, symbols, and short reasons for inclusion. Keep the output factual and inspectable, so the parent can verify what was selected. This contract matters because the parent needs evidence it can inspect, rather than another agent’s prose about what it thinks it found.

The parent agent keeps the goal, plan, and reasoning context while the retrieval subagent burns its own context budget on exploration, reads broadly, and discards the path it took to get there. When it finishes, the parent receives a compact evidence set it can actually reason over. The boundary matters.

Cognition’s SWE-grep is the clearest production version of this pattern. Their coding-agent traces were spending more than 60% of the first turn retrieving context, so they trained a fast subagent that can make up to 8 parallel tool calls per turn and return files with line ranges. The reward function prioritizes file and line precision because polluted context hurts the main agent more than a recoverable omission.

Example

A coding agent needs to fix a bug in invoice rounding.

Without a retrieval subagent, the main agent searches for round, reads invoice models, opens unrelated tax helpers, checks old migrations, inspects billing tests, and carries most of that output forward. By the time it edits the failing function, its context contains the relevant two files plus several dead ends that mention money, tax, or decimals.

With a retrieval subagent, the parent asks for evidence related to invoice rounding behavior. The subagent searches broadly, but returns only:

FileRangeReason
billing/invoice_totals.ts42-88Rounding happens after tax aggregation
billing/invoice_totals.test.ts113-151Failing expectations for half-cent values
billing/currency.ts10-36Shared decimal precision helper

The main agent now has a small evidence set and can reason about the fix without inheriting the search history. The edit is local again.

When to Use

  • Large codebases where finding the right files is a substantial part of the task
  • Agent flows where search output routinely pollutes the main reasoning context
  • Tasks where retrieval can be graded by exact artifacts: files, line ranges, clauses, records, or source snippets
  • Systems that can afford extra tool calls to protect the main context window

When Not to Use

  • Small tasks where the main agent can inspect the relevant file directly
  • Exploratory work where the question itself is still changing every turn
  • Retrieval tasks that require judgment the parent agent cannot delegate safely
  • Cases where a simple index plus Progressive Disclosure gives enough control
  • Isolate is the broader pattern: the retrieval subagent gets its own focused context instead of sharing the parent’s window
  • Retrieval as Context Curation defines the selection standard the subagent applies before anything reaches the parent
  • Select, Don’t Dump applies to the subagent output; every returned range must earn its place
  • Progressive Disclosure is the simpler alternative when an index and on-demand reads are enough