Context Engineering for Code Generation
Include types, interfaces, and existing patterns in your context. Without them, the model generates code that matches its training data instead of your codebase.
The Code Generation Context Problem
Most code generation fails because the model gets a function signature and nothing else. That works for toy examples. In production, the gap between a signature and working code is everything the model can’t see: your types, your interfaces, the way your codebase handles errors, the patterns your team actually uses.
The model doesn’t need to be smarter. It needs to see what you see when you write that function.
What Context the Model Needs
A function signature is necessary but nowhere near sufficient. What else you include depends on the task.
New function implementation:
- Function signature and type annotations
- Related types and interfaces
- Existing functions that call or are called by this function
- Error handling patterns in the codebase
Refactoring:
- Current implementation
- Target interface or signature
- Usage sites that must be updated
- Test cases that must continue to pass
Bug fix:
- Failing test or error message
- Code that produces the error
- Correct handling of similar cases elsewhere
- Type definitions that constrain the fix
Test generation:
- Function or module under test
- Existing test patterns in the codebase
- Edge cases and error conditions to cover
- Mocking or fixture patterns
The Priority Hierarchy
When you’re constrained on tokens, cut from the bottom:
- Immediate code context: The file or function being modified
- Type definitions: Interfaces, types, and schemas the code must satisfy
- Related implementations: Similar functions or patterns in the codebase
- Conventions: Style guides and architectural patterns
- Documentation: External references and API docs
Most people skip #2 and include too much of #5. Types are cheap in tokens and high in signal; documentation is the opposite. Put them last and trim aggressively.
Assembling Context for a New Function
A new function needs three things: what types it must satisfy, how similar functions in the codebase look, and what error handling conventions apply. Anything beyond that dilutes attention.
Types from this file's imports:
[imported type definitions]
Related functions in this file:
[2-3 functions that interact with the new one]
Usage examples from elsewhere:
[up to 3 call sites showing how similar functions are used]
Error handling pattern in this file:
[one example of how this file handles errors]
That is Select, Don’t Dump: minimum viable context, not maximum available context.
Using Existing Code as Few-Shot Examples
The most underused technique in code generation: show the model 2-3 similar functions from your own codebase. Few-shot examples work better for code generation than almost any other domain, because code style is consistent within a project and the model picks up on it immediately.
Task: [description of what to generate]
Examples from this codebase:
Example 1: createUser in user-service.ts
[function implementation]
Example 2: createTeam in team-service.ts
[function implementation]
Generate code following these patterns.
Two examples is usually enough. Three if the pattern has significant variation. More than that and you’re spending tokens on diminishing returns.
Schema Steering for Type-Safe Generation
Include the interface the generated code must implement. This is the single most effective context you can add, because the Schema Steering pattern turns a “generate something reasonable” task into a “satisfy this contract” task:
interface UserService {
findById(id: string): Promise<User | null>;
create(data: CreateUserInput): Promise<User>;
update(id: string, data: UpdateUserInput): Promise<User>;
}
interface CreateUserInput {
email: string;
name: string;
role: 'admin' | 'user';
}
// Generate the update method implementation
Without the interface, the model invents its own types. With it, type errors drop dramatically.
Context for Refactoring
Refactoring is where most code generation context falls apart. The model needs both the code being changed and the code that depends on it, and people almost always forget the second part:
Current implementation:
[the function being refactored]
Usage sites that must be updated (5 of 23 total):
[first 5 call sites showing how the function is currently called]
Target signature:
[the new signature the function should have]
Similar refactors in codebase:
[1-2 examples of similar signature changes and how callers were updated]
Without the usage sites, the model refactors the function perfectly and breaks every caller. Include at least the first 5 call sites; if there are dozens, summarize the pattern.
Context for Bug Fixes
Bug fixes need the failure, the code that produces it, and a correct example from elsewhere:
Failing test:
[the test case that fails]
Error:
[stack trace or error message]
Code to fix:
[the function producing the error]
Correct handling elsewhere:
[a similar function that handles this case correctly]
The correct example matters more than people think. Without it, the model fixes the bug but introduces a new error handling pattern that doesn’t match the rest of the codebase.
Managing Context Size
Code generation contexts grow fast, especially for refactoring tasks that touch many files. Two approaches keep them under control.
Hierarchical inclusion. Start minimal. If compilation fails, expand:
1. Generate with minimal context (types + immediate code)
2. If compilation fails, expand context (add related functions)
3. If still failing, add conventions and full file context
Token budgets. Allocate tokens explicitly:
| Element | Budget |
|---|---|
| Type definitions | 1500 |
| Related functions | 1000 |
| Examples | 800 |
| Task description | 300 |
| Output reserve | 2000 |
Total: 5600 tokens. Leave headroom; generated code can easily consume 2000+ tokens for a complex function.
Common Mistakes
Omitting type definitions. This is the most common and most damaging mistake, because without types the model doesn’t fail loudly; it invents plausible-looking return types, parameter types, and error shapes that compile until you actually run them.
Including entire files instead of extracting functions. A 500-line file where 40 lines are relevant wastes 92% of the context budget on noise. Extract the function, add a comment noting the source file. The model doesn’t need to read the whole thing to understand the part you care about.
No error handling context. Left to its own devices, the model reaches for generic try/except patterns. Show one correct example from your codebase and it matches the pattern consistently. One example is enough.
Skipping test patterns for test generation. This one is subtle because the output still looks correct. If your tests use a specific fixture setup, assertion style, or mocking approach and you don’t show that, the model generates structurally valid tests that fail your linter and that nobody on your team will recognize as belonging to the same suite.