Context Engineering for Search and Recommendations

Search and recommendation systems are context engineering problems disguised as retrieval problems. The user's history, intent, and session state all compete for space in the context window, and most teams include too much of the wrong signal.

Anthropic: Contextual Retrieval , Chroma Research: Evaluating Chunking

The Search Context Problem

When an LLM powers search or recommendations, the temptation is to pack the context window with everything you know about the user: their full purchase history, every page they viewed, their preferences profile, their demographic data, their current cart, their search history. The model has a 128k window, so why not use it?

Because most of that context is noise for the current query. A user searching for “wireless headphones” doesn’t benefit from their purchase history of kitchen appliances from six months ago, but that history still dilutes attention on the signals that actually matter: their recent electronics browsing, their price sensitivity based on past purchases in the same category, and whether they’ve returned audio products before.

The context engineering challenge in search is selecting which user signals are relevant to this query specifically.

User Context: What to Include

User profiles are large. A returning customer on an e-commerce platform might have hundreds of past orders, thousands of page views, and a preferences profile built over years. Including all of it is wasteful; including none of it produces generic results. The question is which slice of the user’s history earns its place in the context window for this specific request.

Category-relevant history: filter purchase and browsing history to the same product category as the current query, plus one level up in the taxonomy. A search for “running shoes” should pull in athletic footwear purchases and browsing from the last 90 days, leaving the rest of the order history out.

Recency-weighted signals: apply Temporal Decay aggressively. A product viewed yesterday is far more relevant than one viewed three months ago. Weight recent signals heavily and drop older ones entirely rather than including them at reduced weight; the model can’t meaningfully use a purchase from 2023 alongside one from last week.

Negative signals: what the user has returned, what they’ve viewed and not purchased, what they’ve explicitly filtered out. These are often more informative than positive signals because they narrow the space. A user who returned wireless earbuds for poor noise cancellation is telling you something specific about their next audio purchase.

Session context: what the user has already done in this session matters more than their long-term profile. If they’ve been comparing two specific products, the context for their next search should reflect that comparison mindset.

Query Intent as Context Engineering

The same query means different things depending on the surrounding context, and assembling the right context around a query is where most search systems under-invest.

A search for “apple” from a user browsing electronics means something completely different from the same query in a grocery context. Most teams handle this through category routing or query classification, but the context engineering approach is more flexible: include enough session and behavioral context that the model can disambiguate intent without explicit routing logic.

def build_search_context(query, user, session, token_budget=4000):
    context = {
        "query": query,
        "session_searches": session.recent_searches[-5:],
        "session_views": session.recent_views[-3:],
        "category_purchases": user.purchases_in_category(
            inferred_category(query), limit=5
        ),
        "returns_in_category": user.returns_in_category(
            inferred_category(query), limit=3
        ),
    }
    return truncate_to_budget(context, token_budget)

The budget constraint forces prioritization. Session searches and views go first because they’re the strongest intent signal. Category-specific purchase history comes next. Returns last, but they’re high-value signals per token because they encode strong negative preferences in a few fields.

Catalog Context

Recommendation systems need product information in context, and including full product descriptions for every candidate is a fast way to blow through the context budget. A catalog of 50 candidate products with full descriptions, reviews, and specifications might consume 30k tokens before the model has seen the user’s query or profile.

Structured summaries over full descriptions: for each candidate product, include a structured block with the fields that matter for ranking: name, price, category, average rating, key specs. Save the full description for the final 3-5 products that make it to the response.

Two-stage context assembly: first pass includes structured summaries for all candidates with the user context and query; let the model rank them. Second pass includes full details for the top results only, re-running the model with rich context for just the finalists. This is Progressive Disclosure applied to product data.

Common Mistakes

Including the full user profile: a user with 500 past orders doesn’t need all 500 in context. Filter to category-relevant, recency-weighted signals. 10-20 relevant data points outperform 500 unfiltered ones.

Ignoring session context: long-term preferences matter less than what the user is doing right now. A user who has been comparing two laptops for the last 15 minutes is in a very specific decision mode; their general “likes electronics” profile doesn’t help.

Static context assembly: using the same context template for every query regardless of intent type. A broad discovery query (“gift ideas for dad”) needs different context than a specific product search (“Sony WH-1000XM5 price”). The context assembly should adapt to query type as well as query content.

No budget enforcement: letting the user profile, catalog data, and query context expand to fill whatever window space is available. Set explicit budgets: 1k tokens for user context, 500 for session, 2k for product candidates, 500 for instructions. The budget forces you to prioritize, and the model performs better with the constraint than without it.