Системы памяти агентов¶

~6 минут чтения

URL: Mem0, Letta, Medium, arXiv, Hugging Face, Shaped.ai Тип: agent-memory / mem0 / memgpt / letta / langmem / zep Дата: Февраль 2026 Сбор: Ralph Research ФАЗА 5

Предварительно: LLM-агенты

Зачем это нужно¶

LLM без памяти забывает всё после каждого запроса. Агент, который не помнит предпочтения пользователя, его прошлые вопросы, результаты прошлых действий -- бесполезен для долгосрочного взаимодействия. Mem0 лидирует с 66.9% accuracy и 1.4s latency (на 26% лучше OpenAI Memory, в 10x быстрее). MemGPT (Letta) подходит к проблеме через аналогию с ОС: context window = RAM, archival memory = disk, и агент сам решает что загрузить. Ключевой trade-off: accuracy vs latency vs стоимость хранения.

Part 1: Overview¶

Executive Summary¶

Key Insight:

"Most AI agents forget everything very soon." Mem0 leads benchmarks with 66.9% accuracy and 1.4s latency (26% better than OpenAI Memory, 91% faster). MemGPT (now Letta) treats LLMs as operating systems with hierarchical memory tiers. The 2026 landscape differentiates between "memory tools" and "memory-aware agents."

2026 Agent Memory Landscape:

System	Focus	Accuracy	Latency	Best For
Mem0	Production	66.9%	1.4s	Enterprise
Letta/MemGPT	Agent OS	74.0%*	Varies	Long-term agents
OpenAI Memory	Native	53.1%	15s+	Simple apps
LangMem	LangChain	~60%	~2s	LangChain ecosystem
Zep	Research	~62%	~3s	Research agents

*On filesystem benchmark

Part 2: Memory Architecture Fundamentals¶

Types of Agent Memory¶

Memory Type	Duration	Purpose	Example
Working	Current turn	Immediate context	Current conversation
Episodic	Session	Event sequences	Chat history
Semantic	Long-term	Facts, knowledge	User preferences
Procedural	Long-term	Skills, patterns	Task workflows

Memory Architecture¶

graph TD
    WM["Working Memory<br/>(Context Window)<br/>Текущий turn, reasoning context"] --> STM
    STM["Short-Term Memory<br/>История разговора, sliding window N turns"] --> LTM
    LTM["Long-Term Memory<br/>Предпочтения, факты о пользователе<br/>Persistent across sessions"] --> EXT
    EXT["External Knowledge<br/>Vector DB, Knowledge Graph,<br/>Document stores"]

    style WM fill:#fce4ec,stroke:#c62828
    style STM fill:#fff3e0,stroke:#ef6c00
    style LTM fill:#e8eaf6,stroke:#3f51b5
    style EXT fill:#e8f5e9,stroke:#4caf50

Part 3: Mem0 Deep Dive¶

Overview¶

Aspect	Details
Developer	Mem0.ai
Focus	Production-ready memory
License	Open source + managed
Key feature	Personalization layer

Mem0 Architecture:

graph TD
    subgraph MEM0["Mem0"]
        MS["Memory Store<br/>Vector DB (Qdrant/Pinecone)<br/>+ Graph store for relationships"] --> MM
        MM["Memory Manager<br/>Add/Update/Delete<br/>Consolidation + Conflict resolution"] --> SR
        SR["Search & Retrieval<br/>Semantic search, temporal queries<br/>User-specific filtering"]
    end

    style MS fill:#e8eaf6,stroke:#3f51b5
    style MM fill:#fff3e0,stroke:#ef6c00
    style SR fill:#e8f5e9,stroke:#4caf50

Mem0 Benchmark Results (LOCOMO)¶

Metric	Mem0	OpenAI	LangMem	Letta
Overall accuracy	66.9%	53.1%	~58%	~60%
Single-hop	85%	70%	75%	72%
Multi-hop	62%	45%	50%	55%
Temporal	58%	40%	48%	52%
Latency	1.4s	15s+	2s	3s

Mem0 Key Features¶

Feature	Description
Self-improving	Learns from interactions
Conflict resolution	Handles contradictory memories
Multi-modal	Text, images, code
API-first	REST API + SDKs
User isolation	Per-user memory stores

Part 4: MemGPT / Letta Deep Dive¶

Overview¶

Aspect	Details
Developer	Letta (formerly MemGPT)
Philosophy	LLM as Operating System
License	Apache 2.0
Key feature	Hierarchical memory tiers

OS-Inspired Architecture¶

graph TD
    subgraph LETTA["MemGPT / Letta (OS-аналогия)"]
        MC["Main Context<br/>= CPU registers<br/>Текущий разговор, active task"] --> CM
        CM["Core Memory<br/>= L1 cache<br/>Ключевые факты, system instructions<br/>Всегда в контексте"] --> AM
        AM["Archival Memory<br/>= Disk<br/>Полная история, knowledge base<br/>Retrieved on demand"]
    end

    style MC fill:#fce4ec,stroke:#c62828
    style CM fill:#fff3e0,stroke:#ef6c00
    style AM fill:#e8eaf6,stroke:#3f51b5

### Self-Editing Memory

| Capability | Description |
|------------|-------------|
| **Write** | Agent stores important facts to core memory |
| **Recall** | Agent searches archival memory |
| **Yield** | Agent swaps out context to make room |
| **Update** | Agent modifies existing memories |

### Letta Benchmark (Filesystem)

| Approach | LOCOMO Score |
|----------|--------------|
| **Simple filesystem** | 74.0% |
| **With vector search** | 76.5% |
| **Specialized systems** | 60-70% |

**Key finding:** Simple approaches can outperform complex memory systems.

---

## Part 5: Other Memory Systems

### 1. LangMem (LangChain)

| Aspect | Details |
|--------|---------|
| **Developer** | LangChain |
| **Focus** | LangChain ecosystem |
| **Integration** | Native LangChain |

**LangMem Features:**
- Memory as LangChain runnable
- Multiple memory types
- Easy chain integration

### 2. Zep

| Aspect | Details |
|--------|---------|
| **Developer** | Zep |
| **Focus** | Research-led |
| **Type** | Memory layer |

**Zep Features:**
- Fact extraction
- Memory summarization
- Temporal awareness

### 3. OpenAI Memory

| Aspect | Details |
|--------|---------|
| **Provider** | OpenAI |
| **Focus** | Simple integration |
| **Type** | Native API |

**OpenAI Memory Limitations:**
- Slow latency (15s+)
- Limited accuracy (53.1%)
- Black box behavior
- No customization

### 4. Cognee

| Aspect | Details |
|--------|---------|
| **Focus** | Knowledge graphs |
| **Innovation** | Graph-based memory |

---

## Part 6: Memory System Comparison

### Feature Matrix

| Feature | Mem0 | Letta | LangMem | Zep | OpenAI |
|---------|------|-------|---------|-----|--------|
| **Self-editing** | ⚠️ | ✅ | ❌ | ⚠️ | ❌ |
| **Graph memory** | ✅ | ❌ | ❌ | ✅ | ❌ |
| **Temporal queries** | ✅ | ⚠️ | ❌ | ✅ | ❌ |
| **Open source** | ✅ | ✅ | ✅ | ✅ | ❌ |
| **Multi-modal** | ✅ | ⚠️ | ❌ | ❌ | ⚠️ |
| **Production-ready** | ✅ | ⚠️ | ⚠️ | ⚠️ | ✅ |

### Performance Comparison

| Metric | Mem0 | Letta | LangMem | OpenAI |
|--------|------|-------|---------|--------|
| **Accuracy** | 66.9% | 74%* | ~58% | 53.1% |
| **Latency** | 1.4s | ~3s | ~2s | 15s+ |
| **Scalability** | High | Medium | Medium | Low |
| **Customization** | High | High | Medium | Low |

### Cost Comparison

| System | Open Source | Managed | Notes |
|--------|-------------|---------|-------|
| **Mem0** | Free | $0.05/1K memories | Vector DB costs extra |
| **Letta** | Free | N/A | Self-host only |
| **LangMem** | Free | N/A | LangChain dependency |
| **Zep** | Free | $50+/mo | Good for teams |
| **OpenAI** | N/A | Per API call | Built into Assistants |

---

## Part 7: Selection Guide

### Decision Tree

Start → What's your priority? │ ├──► Production-ready, fast? │ └──► YES → Mem0 │ ├──► Long-term agent autonomy? │ └──► YES → Letta/MemGPT │ ├──► Already using LangChain? │ └──► YES → LangMem │ ├──► Need graph relationships? │ └──► YES → Zep or Mem0 │ ├──► Simple, already on OpenAI? │ └──► YES → OpenAI Memory │ └──► Research/experimentation? └──► YES → Letta (most flexible)

### Use Case Matrix

| Use Case | Recommended | Reason |
|----------|-------------|--------|
| **Customer support bot** | Mem0 | Production-ready, personalization |
| **Personal assistant** | Letta | Long-term context, autonomy |
| **Chatbot with history** | LangMem | LangChain integration |
| **Research agent** | Letta | Memory exploration, reasoning |
| **Enterprise deployment** | Mem0 | Scalability, support |
| **Simple FAQ bot** | OpenAI Memory | Easiest, native |

---

## Part 8: Implementation Patterns

### Mem0 Integration

```python
from mem0 import Memory

m = Memory()

# Add memory
m.add("User prefers dark mode", user_id="user123")

# Search memories
results = m.search("interface preferences", user_id="user123")

# Get all memories
all_memories = m.get_all(user_id="user123")

Letta Integration¶

from letta import create_client

client = create_client()

# Create agent with memory
agent = client.create_agent(
    name="my_agent",
    memory_blocks=["Core memory about user..."]
)

# Send message (agent manages memory automatically)
response = client.send_message(
    agent_id=agent.id,
    message="Hello!",
    role="user"
)

Part 9: Interview-Relevant Numbers¶

LOCOMO Benchmark Scores¶

System	Single-hop	Multi-hop	Temporal	Overall
Mem0	85%	62%	58%	66.9%
OpenAI	70%	45%	40%	53.1%
Letta (fs)	88%	65%	62%	74.0%

Latency Comparison¶

System	Add Memory	Search	Get All
Mem0	0.3s	0.5s	0.2s
OpenAI	5-10s	10-15s	3-5s
LangMem	0.5s	1.0s	0.5s

Memory Capacity¶

System	Max Memories	Storage
Mem0	Unlimited	Vector DB
Letta	Unlimited	Archival
OpenAI	128K tokens	Managed

Cost per 1K Operations¶

System	Cost
Mem0 (managed)	~$0.50
Letta (self-host)	Vector DB costs
OpenAI	~$2-5 (API calls)

Gotchas¶

RAG -- это не память

RAG retrieves знания из документов (статические факты). Память хранит персонализированный контекст: предпочтения пользователя, историю взаимодействий, результаты прошлых действий. RAG отвечает "что такое X", память отвечает "пользователь предпочитает Y". Для полноценного агента нужны оба.

Бенчмарки памяти несравнимы напрямую

Mem0 66.9% на LOCOMO, Letta 74% на filesystem benchmark -- это разные бенчмарки. LOCOMO тестирует single/multi-hop recall в разговорах. Filesystem тестирует file system operations. Сравнивать можно только на одном бенчмарке. Mem0 лидирует по latency (1.4s vs 15s OpenAI), Letta -- по архитектурной гибкости.

Long-term memory = privacy liability

Память агента хранит персональные данные пользователя (предпочтения, факты, историю). Без user isolation один пользователь может получить данные другого. Без TTL и forget mechanisms -- нарушение GDPR/right to be forgotten. Обязательно: per-user memory stores, explicit delete API, data retention policies.

Interview Q&A¶

Q: Какие типы памяти нужны AI-агенту?

Red flag: "Chat history достаточно"

Strong answer: "Четыре типа: (1) Working memory -- текущий turn, ограничена context window. (2) Episodic -- история событий сессии (что агент делал и результаты). (3) Semantic -- долгосрочные факты (предпочтения пользователя, knowledge base). (4) Procedural -- навыки и паттерны (как решать определённые задачи). MemGPT моделирует это через OS-аналогию: working = CPU registers, core = L1 cache, archival = disk."

Q: Сравните Mem0 и MemGPT/Letta.

Strong answer: "Mem0: production-first, API-driven, vector DB + graph store, 66.9% accuracy, 1.4s latency. Лучше для enterprise -- быстрая интеграция, managed service. Letta (MemGPT): agent OS philosophy, иерархическая память (main/core/archival), агент сам управляет что загружать в контекст. Лучше для long-running autonomous agents где агент должен самостоятельно решать что помнить. Key insight: Mem0 = external memory service, Letta = memory-aware agent architecture."

Q: Как оценить качество системы памяти агента?

Strong answer: "Три оси: (1) Accuracy -- single-hop recall (помнит ли факт), multi-hop (связывает ли факты), temporal (помнит ли порядок событий). (2) Latency -- retrieval < 2s для interactive, < 500ms для real-time. (3) Scalability -- как деградирует accuracy с ростом числа memories (100 vs 10K vs 1M). LOCOMO benchmark покрывает все три. На практике ещё важна conflict resolution: что если пользователь сказал 'я живу в Москве' а потом 'я переехал в СПб'."

Sources¶

Mem0 — "AI Memory Benchmark: OpenAI vs LangMem vs MemGPT vs Mem0"
Medium — "LLM as Operating Systems: Agent Memory"
Medium — "AI Memory Wars: Why One System Crushed the Competition"
Shaped.ai — "The 8 Best Tools for AI Agent Memory (2026 Guide)"
Hugging Face — "Mem0: Building Production-Ready AI Agents"
arXiv — "SimpleMem: Efficient Lifelong Memory for LLM Agents" (2601.02553)
Letta Blog — "Filesystem Benchmark Results"
GitHub — "Agent-Memory-Paper-List"