About Context Engineering

The Core Idea

Every time you call an LLM, you send it a context window: a structured bundle of text that includes system instructions, conversation history, tool outputs, retrieved documents, and whatever else your application assembled for that call. Context engineering is the discipline of deciding what goes into that window, how it's structured, and when to change it.

The context window is everything the model knows at inference time. It has no access to your codebase, your database, or your documentation unless you put it there. What you include determines what the model can do. What you leave out determines where it fails.

Why It Matters

Early LLM applications were single-turn: you wrote a prompt, the model responded, done. In that world, the prompt was the context. Phrasing mattered because there was nothing else.

Production LLM applications look different. A coding agent accumulates dozens of tool calls, file reads, and error messages over a session. A RAG pipeline retrieves documents, re-ranks them, and assembles them alongside instructions. A multi-agent system passes context between parent and child agents, each with different needs. In all of these, the original prompt is a small fraction of what the model actually sees.

After a few turns of a multi-turn agent conversation, the system prompt might account for 5% of the tokens in the window. The other 95% is accumulated conversation history, tool outputs, and retrieved context that the application assembled programmatically. Nobody is hand-crafting that 95%. It's built by code, which means it's an engineering problem.

What Makes It Hard

Context windows have finite capacity, and quality degrades well before you hit the limit. The NoLiMa benchmark found that 11 of 13 leading models dropped to half their baseline performance at just 32k tokens. Not at the edge of their advertised window. At a fraction of it. Filling the window with everything you have is worse than including nothing at all, because irrelevant information actively degrades the model's attention on the information that matters.

This creates a set of engineering tradeoffs that don't exist in traditional software. Selecting which information to include and which to leave out, knowing that both over-inclusion and under-inclusion cause failures. Structuring that information so the model attends to the right parts. Managing the lifecycle of context across turns, compressing or evicting stale content before it crowds out fresh information. All of it happens programmatically, at runtime, which means it's an engineering problem with engineering solutions.

How It Differs from Prompt Engineering

Prompt engineering focuses on phrasing: how you word the instruction, what few-shot examples you include, how you format the expected output. These things still matter, but they operate on a small slice of the window. Context engineering operates on the rest: what documents get retrieved, how conversation history gets managed, what information flows between agents, when to compress and restart. Prompt engineering is writing a good question. Context engineering is building the information environment the question lives in.

What This Site Covers

Context Patterns catalogs recurring solutions to context engineering problems. The pattern catalog covers core techniques: selecting relevant information, structuring it for attention, compressing accumulated context, persisting state outside the window, isolating contexts across agents, and more. Each pattern describes the problem it solves, how it works, and when to use it.

The guides apply these patterns to specific domains: RAG pipelines, code generation, coding agents, multi-agent frameworks. The research page collects the papers, benchmarks, and practitioner writing that the patterns draw from.

The audience is engineers building LLM applications in production. The patterns assume you're past "how do I call the API" and working on "how do I make this reliable at scale."