Guides

Applied context engineering. Deep dives into specific patterns, framework guides, and case studies from production systems.

Deep Dives

Extended analysis of individual patterns: implementation details, data, and comparisons.

Memory Architectures for AI Agents

Compare memory implementations across systems. Flat files, structured databases, vector stores, and hybrid approaches. Map MemGPT, Claude, ChatGPT, and coding agents to episodic, semantic, and procedural memory concepts.

LangChain: Context Engineering for Agents , Letta: Context-Bench , 0xeb/TheBigPromptLibrary (MIT)

Context Rot Across Models

Data-driven comparison of how different models handle long context. NoLiMa and RULER benchmarks reveal which models maintain quality and which degrade fastest across GPT-4o, Claude, Gemini, Llama, and Mistral.

NoLiMa Benchmark (arXiv) , RULER: What's the Real Context Size? , Lost in the Middle (Liu et al.) , Claude 3.5 Sonnet Context Window , OpenAI GPT-4o Documentation

Recursive Delegation in Swarm, CrewAI, and LangGraph

How OpenAI Swarm, CrewAI, and LangGraph implement recursive delegation. Each framework handles context passing, result aggregation, and agent spawning differently.

OpenAI Swarm Documentation , CrewAI Documentation , LangGraph Documentation , Anthropic: Multi-Agent Research System

System Prompt Engineering

System prompts accumulate. Instructions get added, constraints pile up, examples get appended. Most production system prompts are longer than they need to be, ordered worse than they could be, and maintained less rigorously than the code they govern.

Taskade: Analysis of 120+ Leaked System Prompts , Anthropic: Prompting Best Practices , OpenAI: Prompt Engineering Guide , 0xeb/TheBigPromptLibrary (MIT)

Context Window Economics

Token costs are not a billing footnote; they are the constraint that forces every other context engineering decision. Understanding the actual cost structure, broken down by input, output, cached, and fresh, changes how you design systems.

Anthropic: Prompt Caching Documentation , Artificial Analysis: Prompt Caching Cost Analysis , RAG or Long-Context LLMs? (arXiv) , RouteLLM: Cost-Effective LLM Routing (LMSYS)

Anatomy of a Production System Prompt

Real system prompts from Claude, ChatGPT, and GitHub Copilot, annotated against context engineering patterns. What they get right, where they break their own rules, and what the structure tells you about each product's priorities.

0xeb/TheBigPromptLibrary (MIT) , Anthropic: Effective Context Engineering for AI Agents , Taskade: Analysis of 120+ Leaked System Prompts

System Prompt Growth Over Time

Claude's system prompt grew 23x in 18 months. ChatGPT's doubled. Dated snapshots from real products show the accumulation problem playing out in public, with timestamps.

0xeb/TheBigPromptLibrary (MIT) , Anthropic: System Prompts Release Notes

MCP and A2A as Context Engineering Infrastructure

The Model Context Protocol and Agent-to-Agent protocol don't just transport context; they force you to make context engineering decisions at the protocol level. Tool descriptions become context. Resource endpoints become progressive disclosure. Agent cards become handoff contracts.

Model Context Protocol Specification , Google A2A Protocol

Context Engineering at the Gateway Layer

LLM gateways and routers make context engineering decisions before the application even sees the request. Model selection, context compression, cache routing, and cost optimization all happen at this layer, and most teams don't think of them as context engineering.

Anthropic: Prompt Caching Documentation , Context Tax: Latency vs. Accuracy at Scale

Guides

How to apply context patterns with specific frameworks, domains, and use cases.

Context Engineering for RAG Pipelines

Most RAG implementations fail not because retrieval is bad, but because nobody thought about what happens after retrieval. Bad chunking, no re-ranking, and no context budgeting waste the tokens you spent retrieving.

Chroma Research: Evaluating Chunking , Anthropic: Contextual Retrieval , 0xeb/TheBigPromptLibrary (MIT)

Context Engineering for Coding Agents

Configure Claude Code, Cursor, and Windsurf for better results. Structure your AGENTS.md and .cursorrules files to provide the right context at the right time.

Claude Code Documentation , Cursor: Context and Codebase Understanding , Windsurf Documentation

Context Engineering for Code Generation

Include types, interfaces, and existing patterns in your context. Without them, the model generates code that matches its training data instead of your codebase.

GitHub Copilot: How It Works , OpenAI: Best Practices for Code Generation

Context Engineering vs Prompt Engineering

Prompt engineering is about phrasing one request well. Context engineering is about assembling the information environment that makes the model capable of doing the work at all. They sound similar but they solve different problems, and confusing them is why most agent systems degrade after a few turns.

Andrej Karpathy on Context Engineering , Birgitta Böckeler: What Is Context Engineering? , Anthropic: Effective Context Engineering for AI Agents

Agentic Context Efficiency: A Benchmark

Four models ran the same 90-turn agentic task. The one that front-loaded all source reads hit 100% cache utilisation; the one that read on demand consumed 10,000x more fresh input tokens.

Anthropic: Prompt Caching , contextpatterns.com benchmark

Context Engineering for Customer Support Bots

Customer support is the most common production LLM use case and the one most likely to go wrong in ways that visibly damage trust. Wrong return windows, hallucinated policies, contradictions across turns: these are context problems, not model problems.

Anthropic: Contextual Retrieval , AWS: Managing Chat History and Context at Scale in Generative AI Chatbots , Intercom: Fin AI Agent

Evaluating and Observing Context Quality

Most teams have no idea whether their context engineering is actually working. They ship a RAG pipeline, check that it returns answers, and call it done. Here is how to measure what is actually happening inside the context window.

RAGAS: Evaluation Framework for RAG Pipelines , Chroma Research: Context Rot , LangSmith: LLM Observability Platform

Context Engineering for Multi-Turn Conversations

Conversation history is the context problem most applications have and least teams think about. It grows unbounded, degrades quality silently, and fails in predictable ways that a small amount of engineering prevents.

LLMs Get Lost In Multi-Turn Conversation (arXiv) , Microsoft Semantic Kernel: Managing Chat History for LLMs , mem0: LLM Chat History Summarization

Context Engineering Anti-Patterns

Most context bugs don't look like bugs. The model produces an answer, it just isn't the right one. Here are the failure modes that cause this, and how to recognize which one you're hitting.

Drew Breunig: How Long Contexts Fail , Drew Breunig: How to Fix Your Context , NoLiMa Benchmark

Context Engineering for Data Extraction

Extracting structured data from documents is one of the highest-value LLM use cases in production, and also where poor context engineering shows up most visibly: missing fields, wrong values, and silent failures that corrupt downstream systems.

Anthropic: Structured Outputs , Azure OpenAI: Best Practices for Structured Extraction from Documents , OpenAI: Structured Outputs

Context Engineering for Search and Recommendations

Search and recommendation systems are context engineering problems disguised as retrieval problems. The user's history, intent, and session state all compete for space in the context window, and most teams include too much of the wrong signal.

Anthropic: Contextual Retrieval , Chroma Research: Evaluating Chunking

Context Engineering for Legal and Compliance

Legal document analysis demands context engineering that most domains don't: every claim must be traceable to a specific clause, hallucinated content creates liability, and the documents themselves are longer than most models can reliably process.

NoLiMa Benchmark: Long-Context Evaluation , Anthropic: Structured Outputs

Context Engineering for Autonomous Agents

Autonomous agents face context challenges that chatbots and coding assistants don't: open-ended exploration, unpredictable tool outputs, sessions that run for hundreds of turns, and no human in the loop to course-correct when the context degrades.

LOCA-bench: Long-Running Agent Context Rot , Anthropic: How We Built a Multi-Agent Research System

Case Studies

How teams apply context engineering in production.