Context Engineering for Code Generation

Include types, interfaces, and existing patterns in your context. Without them, the model generates code that matches its training data instead of your codebase.

GitHub Copilot: How It Works , OpenAI: Best Practices for Code Generation

The Code Generation Context Problem

Most code generation fails because the model gets a function signature and nothing else. That works for toy examples. In production, the gap between a signature and working code is everything the model can’t see: your types, your interfaces, the way your codebase handles errors, the patterns your team actually uses.

The model doesn’t need to be smarter. It needs to see what you see when you write that function.

What Context the Model Needs

A function signature is necessary but nowhere near sufficient. What else you include depends on the task.

New function implementation:

Function signature and type annotations
Related types and interfaces
Existing functions that call or are called by this function
Error handling patterns in the codebase

Refactoring:

Current implementation
Target interface or signature
Usage sites that must be updated
Test cases that must continue to pass

Bug fix:

Failing test or error message
Code that produces the error
Correct handling of similar cases elsewhere
Type definitions that constrain the fix

Test generation:

Function or module under test
Existing test patterns in the codebase
Edge cases and error conditions to cover
Mocking or fixture patterns

The Priority Hierarchy

When you’re constrained on tokens, cut from the bottom:

Immediate code context: The file or function being modified
Type definitions: Interfaces, types, and schemas the code must satisfy
Related implementations: Similar functions or patterns in the codebase
Conventions: Style guides and architectural patterns
Documentation: External references and API docs

Most people skip #2 and include too much of #5. Types are cheap in tokens and high in signal; documentation is the opposite. Put them last and trim aggressively.

Assembling Context for a New Function

A new function needs three things: what types it must satisfy, how similar functions in the codebase look, and what error handling conventions apply. Anything beyond that dilutes attention.

Types from this file's imports:
[imported type definitions]

Related functions in this file:
[2-3 functions that interact with the new one]

Usage examples from elsewhere:
[up to 3 call sites showing how similar functions are used]

Error handling pattern in this file:
[one example of how this file handles errors]

That is Select, Don’t Dump: minimum viable context, not maximum available context.

Using Existing Code as Few-Shot Examples

The most underused technique in code generation: show the model 2-3 similar functions from your own codebase. Few-shot examples work better for code generation than almost any other domain, because code style is consistent within a project and the model picks up on it immediately.

Task: [description of what to generate]

Examples from this codebase:

Example 1: createUser in user-service.ts
[function implementation]

Example 2: createTeam in team-service.ts
[function implementation]

Generate code following these patterns.

Two examples is usually enough. Three if the pattern has significant variation. More than that and you’re spending tokens on diminishing returns.

Schema Steering for Type-Safe Generation

Include the interface the generated code must implement. This is the single most effective context you can add, because the Schema Steering pattern turns a “generate something reasonable” task into a “satisfy this contract” task:

interface UserService {
  findById(id: string): Promise<User | null>;
  create(data: CreateUserInput): Promise<User>;
  update(id: string, data: UpdateUserInput): Promise<User>;
}

interface CreateUserInput {
  email: string;
  name: string;
  role: 'admin' | 'user';
}

// Generate the update method implementation

Without the interface, the model invents its own types. With it, type errors drop dramatically.

Context for Refactoring

Refactoring is where most code generation context falls apart. The model needs both the code being changed and the code that depends on it, and people almost always forget the second part:

Current implementation:
[the function being refactored]

Usage sites that must be updated (5 of 23 total):
[first 5 call sites showing how the function is currently called]

Target signature:
[the new signature the function should have]

Similar refactors in codebase:
[1-2 examples of similar signature changes and how callers were updated]

Without the usage sites, the model refactors the function perfectly and breaks every caller. Include at least the first 5 call sites; if there are dozens, summarize the pattern.

Context for Bug Fixes

Bug fixes need the failure, the code that produces it, and a correct example from elsewhere:

Failing test:
[the test case that fails]

Error:
[stack trace or error message]

Code to fix:
[the function producing the error]

Correct handling elsewhere:
[a similar function that handles this case correctly]

The correct example matters more than people think. Without it, the model fixes the bug but introduces a new error handling pattern that doesn’t match the rest of the codebase.

Managing Context Size

Code generation contexts grow fast, especially for refactoring tasks that touch many files. Two approaches keep them under control.

Hierarchical inclusion. Start minimal. If compilation fails, expand:

1. Generate with minimal context (types + immediate code)
2. If compilation fails, expand context (add related functions)
3. If still failing, add conventions and full file context

Token budgets. Allocate tokens explicitly:

Element	Budget
Type definitions	1500
Related functions	1000
Examples	800
Task description	300
Output reserve	2000

Total: 5600 tokens. Leave headroom; generated code can easily consume 2000+ tokens for a complex function.

Common Mistakes

Omitting type definitions. This is the most common and most damaging mistake, because without types the model doesn’t fail loudly; it invents plausible-looking return types, parameter types, and error shapes that compile until you actually run them.

Including entire files instead of extracting functions. A 500-line file where 40 lines are relevant wastes 92% of the context budget on noise. Extract the function, add a comment noting the source file. The model doesn’t need to read the whole thing to understand the part you care about.

No error handling context. Left to its own devices, the model reaches for generic try/except patterns. Show one correct example from your codebase and it matches the pattern consistently. One example is enough.

Skipping test patterns for test generation. This one is subtle because the output still looks correct. If your tests use a specific fixture setup, assertion style, or mocking approach and you don’t show that, the model generates structurally valid tests that fail your linter and that nobody on your team will recognize as belonging to the same suite.