Memory Architectures for AI Agents
Compare memory implementations across systems. Flat files, structured databases, vector stores, and hybrid approaches. Map MemGPT, Claude, ChatGPT, and coding agents to episodic, semantic, and procedural memory concepts.
The Problem This Solves
The Write Outside the Window pattern establishes that you need persistence. The question is what kind. A flat markdown file, a vector database, a relational store, or some combination? The answer depends on what you’re storing, how you query it, and how much infrastructure complexity you’re willing to take on.
Memory Taxonomy
The cognitive science framing (episodic, semantic, procedural) maps cleanly to different implementation choices, each needing different storage, retrieval, and update patterns.
Episodic Memory
What happened. Conversation history, tool executions, task outcomes.
Claude Code uses CLAUDE.md to accumulate lessons learned. Every time Claude discovers something about the project, like a circular import or a test that needs a specific mock, it writes it down. Future sessions read this file and avoid repeating the same mistakes.
# Project Memory
- Auth module has circular import; use interface not direct import
- Rate limiter tests fail without Redis mock
- User.email unique constraint not enforced at ORM level
Semantic Memory
What is known. Facts, knowledge, documentation, domain information.
This is where RAG lives. Documents chunked, embedded, and indexed in a vector store. At query time, relevant chunks come back based on semantic similarity.
def store_fact(fact, metadata):
embedding = embed_model.encode(fact)
vector_store.add(embedding, {"text": fact, "meta": metadata})
def retrieve_fact(query):
query_embedding = embed_model.encode(query)
results = vector_store.search(query_embedding, k=5)
return [r["text"] for r in results]
Procedural Memory
How to act. System prompts, agent instructions, behavioral patterns.
This is the most overlooked memory type because people don’t think of system prompts as “memory.” But that’s exactly what they are: persistent instructions that shape every interaction.
You are a code reviewer.
Your process:
1. Read the changed files
2. Check for security issues
3. Check for performance problems
4. Verify test coverage
Output format: JSON with issues array.
Architecture Comparison
Flat File Memory
Used by: Claude Code, Cursor, Windsurf.
Plain text or markdown files in the project root. The simplest possible implementation, and for most projects, the right one.
Pros: Human readable, human editable, version controllable, zero infrastructure.
Cons: Linear search only. Scales poorly past a few thousand lines. No ranking or filtering.
Don’t underestimate flat files. A well-maintained CLAUDE.md with 50 lines of hard-won project knowledge outperforms a vector store full of auto-generated summaries.
Vector Store Memory
Used by: RAG systems, MemGPT, Letta.
Embeddings stored in a vector database (Pinecone, Weaviate, Chroma, pgvector).
Pros: Semantic search at scale, millions of documents, built-in relevance ranking.
Cons: Requires an embedding model, retrieval is approximate not exact, and metadata management adds complexity that’s easy to underestimate.
Best for: Large document corpora. Overkill for project-level memory where a flat file suffices.
Structured Database Memory
Used by: Enterprise systems, LangGraph state.
Relational, document, or graph databases with explicit schemas.
Pros: Exact queries, rich capabilities (joins, aggregations, filters), typed fields.
Cons: Schema design upfront, less flexible for unstructured queries, semantic search needs a separate component.
Best for: When you know the shape of your data and need precise lookups rather than fuzzy similarity.
Hybrid Approaches
Most production systems combine multiple approaches:
class AgentMemory:
def __init__(self):
self.episodic = MessageStore()
self.semantic = VectorStore()
self.procedural = SystemPrompt()
self.flat = FlatFile("CLAUDE.md")
def read(self, query):
recent = self.episodic.last_n(10)
relevant = self.semantic.search(query)
quick = self.flat.read_all() # loaded in full; small by design
return combine(recent, relevant, quick)
Build a hybrid when you have genuinely different query patterns. Don’t build one because it seems more sophisticated.
System Comparisons
MemGPT / Letta
The most ambitious approach in this space: a full memory hierarchy where the system decides what stays in working memory versus what gets archived. Working memory holds the current conversation and active task. Archival memory holds everything else, searchable on demand. Core memory holds facts that must persist across all interactions.
The design is appealing because it mirrors how humans actually handle memory, offloading things we don’t need right now and retrieving them when relevant. Letta’s own Context-Bench evaluates how well agents maintain facts and context across long interactions, and the results show that even purpose-built memory systems struggle with multi-hop retrieval once archival memory grows large. The problem is that “what to archive” is itself an LLM call. If the model decides to archive something it should have kept, or keep something it should have evicted, you get degraded behavior that’s genuinely hard to debug. You can’t easily inspect why it forgot something; the memory management layer adds a second source of failure on top of the model itself.
Claude (via AGENTS.md / CLAUDE.md)
Flat file memory with human-in-the-loop curation. The model writes to the file; humans edit it directly. This sounds too simple to be worth comparing to MemGPT, but in practice the human curation step does something automatic systems can’t: it filters out noise. A well-maintained file with 50 carefully selected entries beats a vector store with 5000 auto-generated summaries of mixed quality. The constraint is a feature.
ChatGPT and Coding Agents
ChatGPT’s persistent memory gives users explicit control (“remember this,” “forget that”), but the mechanism for determining when stored memories are relevant to a given conversation is opaque. You can’t see what the model is drawing on, which makes debugging wrong behavior difficult.
Cursor and Windsurf split the problem: config files handle procedural memory, and optional repository-level semantic indexes handle semantic memory over the codebase. Simpler than MemGPT, more structured than a flat file, and enough for most coding workflows.
Choosing an Architecture
| Factor | Flat File | Vector Store | Database | Hybrid |
|---|---|---|---|---|
| Scale | <10k lines | Any | Any | Any |
| Retrieval | Linear | Semantic | Exact | Flexible |
| Complexity | Low | Medium | High | High |
| Infrastructure | None | Embedding model + DB | DB server | Multiple |
| Best for | Projects, teams | Large corpora | Structured data | Production systems |
Start with flat file memory. It’s sufficient for most projects and teams, and the human curation it requires is a feature, not a limitation. Add a vector store when you have a document corpus too large to curate manually. Go hybrid only when you have genuinely distinct query patterns that a single approach can’t serve. Most teams that start with a hybrid architecture would have been better off with a flat file and a month of accumulated knowledge.