Temporal Decay

Weight recent context higher and systematically age out old information. Old messages can stay available, but they should compete less with the current task.

LangChain: Memory Concepts , Letta: Context-Bench

The Problem This Solves

In a long conversation, context accumulates. Every message, every tool result, every intermediate decision stays in the window unless something removes it. After 30 turns the context is full of things that happened when the conversation was about something else entirely.

The problem is relevance by age as much as volume. A decision made in turn 3 may have been superseded by a different decision in turn 20. A file the model read early in an agent loop may no longer exist. The user said they wanted blue buttons, then changed their mind and said green; both messages are in context, and the model does not automatically know which takes precedence.

Compress & Restart handles this by summarizing and discarding old content. Temporal decay is a different approach: keep the information but reduce how much it competes with recent context. The two are complementary, since decay manages relevance within a window while compression manages the size of it.

How It Works

The core intuition is that recency is a strong signal for relevance. What happened in the last five turns is almost always more relevant than what happened 30 turns ago, and making that intuition explicit in how you construct context is what temporal decay does.

Window-based selection keeps only the last N turns in the active context, dropping or archiving everything older. This is the simplest implementation and often sufficient. LangChain’s ConversationBufferWindowMemory does exactly this.

Tiered context keeps the last five turns verbatim, summarizes turns five through twenty into a compact paragraph, and discards everything older. The model sees full-fidelity recent context and a compressed record of the earlier conversation. This combines temporal decay with Compress & Restart: maximum detail where recency matters most, compression where it matters less.

Semantic recency retrieves against conversation history using the current query instead of using time-based cutoffs alone. Recent relevant turns surface at high weight. Relevant older turns can still appear, while irrelevant old turns drop out entirely. This is more expensive because it requires embedding the history, but it handles cases where important older context should not be lost.

Explicit timestamps let the model reason about recency itself. “User preference stated 3 hours ago” is more useful than the raw statement when the model needs to weigh it against something said 30 seconds ago. This works best in agent contexts where decisions have clear timestamps attached.

Example

A coding agent helping a developer debug across multiple files over a long session.

Without temporal decay:

Turn 1-8:   Debug auth.py, fix the token expiry bug (resolved)
Turn 9:     User switches to payments.py
Turn 10-20: Debug payment processor timeout (resolved)
Turn 30:    User asks about a problem in utils.py

At turn 30, the context still contains the full auth.py debug session, including code snippets and decisions that no longer apply to anything the user is working on.

With a sliding window:

from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=5, return_messages=True)

At turn 30, the model sees only turns 25-30. The auth.py session is gone from active context, though anything important from it can be written to external storage before it ages out.

With tiered context:

def build_tiered_context(messages, recent_n=5, summary_n=20):
    recent = messages[-recent_n:]
    middle = messages[-summary_n:-recent_n]

    if middle:
        summary = llm.complete(
            "Summarize this conversation history in 3-5 sentences, "
            "noting any decisions or constraints that still apply:\n"
            + format_messages(middle)
        )
        return [SystemMessage(f"Earlier context:\n{summary}")] + recent

    return recent

The model sees a compact summary of the middle period and full detail of the recent turns. Decisions the summary preserves remain available; the noise from 15 turns of debugging a completely unrelated module does not.

When to Use

Long-running agent loops that span multiple distinct tasks or topics
Conversations where users change direction mid-session
Any scenario where the most recent exchanges are the strongest signal for what the model should do next

When Not to Use

Tasks that require understanding the full history, such as legal or audit contexts where the sequence of events matters
When older context is specifically more relevant than recent context, such as a constraint established early that the model must not forget
Short conversations where the decay window would include everything anyway

Context Rot describes the broader degradation problem; temporal decay addresses specifically the age dimension of it
Compress & Restart reduces context size; temporal decay reduces relevance skew; used together they cover both dimensions
Write Outside the Window provides the storage layer for information that should survive the decay cutoff without staying in the active window

The Problem This Solves

How It Works

Example

When to Use

When Not to Use

Related Patterns