Системы памяти агентов¶
~6 минут чтения
URL: Mem0, Letta, Medium, arXiv, Hugging Face, Shaped.ai Тип: agent-memory / mem0 / memgpt / letta / langmem / zep Дата: Февраль 2026 Сбор: Ralph Research ФАЗА 5
Предварительно: LLM-агенты
Зачем это нужно¶
LLM без памяти забывает всё после каждого запроса. Агент, который не помнит предпочтения пользователя, его прошлые вопросы, результаты прошлых действий -- бесполезен для долгосрочного взаимодействия. Mem0 лидирует с 66.9% accuracy и 1.4s latency (на 26% лучше OpenAI Memory, в 10x быстрее). MemGPT (Letta) подходит к проблеме через аналогию с ОС: context window = RAM, archival memory = disk, и агент сам решает что загрузить. Ключевой trade-off: accuracy vs latency vs стоимость хранения.
Part 1: Overview¶
Executive Summary¶
Key Insight:
"Most AI agents forget everything very soon." Mem0 leads benchmarks with 66.9% accuracy and 1.4s latency (26% better than OpenAI Memory, 91% faster). MemGPT (now Letta) treats LLMs as operating systems with hierarchical memory tiers. The 2026 landscape differentiates between "memory tools" and "memory-aware agents."
2026 Agent Memory Landscape:
| System | Focus | Accuracy | Latency | Best For |
|---|---|---|---|---|
| Mem0 | Production | 66.9% | 1.4s | Enterprise |
| Letta/MemGPT | Agent OS | 74.0%* | Varies | Long-term agents |
| OpenAI Memory | Native | 53.1% | 15s+ | Simple apps |
| LangMem | LangChain | ~60% | ~2s | LangChain ecosystem |
| Zep | Research | ~62% | ~3s | Research agents |
*On filesystem benchmark
Part 2: Memory Architecture Fundamentals¶
Types of Agent Memory¶
| Memory Type | Duration | Purpose | Example |
|---|---|---|---|
| Working | Current turn | Immediate context | Current conversation |
| Episodic | Session | Event sequences | Chat history |
| Semantic | Long-term | Facts, knowledge | User preferences |
| Procedural | Long-term | Skills, patterns | Task workflows |
Memory Architecture¶
graph TD
WM["Working Memory<br/>(Context Window)<br/>Текущий turn, reasoning context"] --> STM
STM["Short-Term Memory<br/>История разговора, sliding window N turns"] --> LTM
LTM["Long-Term Memory<br/>Предпочтения, факты о пользователе<br/>Persistent across sessions"] --> EXT
EXT["External Knowledge<br/>Vector DB, Knowledge Graph,<br/>Document stores"]
style WM fill:#fce4ec,stroke:#c62828
style STM fill:#fff3e0,stroke:#ef6c00
style LTM fill:#e8eaf6,stroke:#3f51b5
style EXT fill:#e8f5e9,stroke:#4caf50
Part 3: Mem0 Deep Dive¶
Overview¶
| Aspect | Details |
|---|---|
| Developer | Mem0.ai |
| Focus | Production-ready memory |
| License | Open source + managed |
| Key feature | Personalization layer |
Mem0 Architecture:
graph TD
subgraph MEM0["Mem0"]
MS["Memory Store<br/>Vector DB (Qdrant/Pinecone)<br/>+ Graph store for relationships"] --> MM
MM["Memory Manager<br/>Add/Update/Delete<br/>Consolidation + Conflict resolution"] --> SR
SR["Search & Retrieval<br/>Semantic search, temporal queries<br/>User-specific filtering"]
end
style MS fill:#e8eaf6,stroke:#3f51b5
style MM fill:#fff3e0,stroke:#ef6c00
style SR fill:#e8f5e9,stroke:#4caf50
Mem0 Benchmark Results (LOCOMO)¶
| Metric | Mem0 | OpenAI | LangMem | Letta |
|---|---|---|---|---|
| Overall accuracy | 66.9% | 53.1% | ~58% | ~60% |
| Single-hop | 85% | 70% | 75% | 72% |
| Multi-hop | 62% | 45% | 50% | 55% |
| Temporal | 58% | 40% | 48% | 52% |
| Latency | 1.4s | 15s+ | 2s | 3s |
Mem0 Key Features¶
| Feature | Description |
|---|---|
| Self-improving | Learns from interactions |
| Conflict resolution | Handles contradictory memories |
| Multi-modal | Text, images, code |
| API-first | REST API + SDKs |
| User isolation | Per-user memory stores |
Part 4: MemGPT / Letta Deep Dive¶
Overview¶
| Aspect | Details |
|---|---|
| Developer | Letta (formerly MemGPT) |
| Philosophy | LLM as Operating System |
| License | Apache 2.0 |
| Key feature | Hierarchical memory tiers |
OS-Inspired Architecture¶
graph TD
subgraph LETTA["MemGPT / Letta (OS-аналогия)"]
MC["Main Context<br/>= CPU registers<br/>Текущий разговор, active task"] --> CM
CM["Core Memory<br/>= L1 cache<br/>Ключевые факты, system instructions<br/>Всегда в контексте"] --> AM
AM["Archival Memory<br/>= Disk<br/>Полная история, knowledge base<br/>Retrieved on demand"]
end
style MC fill:#fce4ec,stroke:#c62828
style CM fill:#fff3e0,stroke:#ef6c00
style AM fill:#e8eaf6,stroke:#3f51b5
### Self-Editing Memory
| Capability | Description |
|------------|-------------|
| **Write** | Agent stores important facts to core memory |
| **Recall** | Agent searches archival memory |
| **Yield** | Agent swaps out context to make room |
| **Update** | Agent modifies existing memories |
### Letta Benchmark (Filesystem)
| Approach | LOCOMO Score |
|----------|--------------|
| **Simple filesystem** | 74.0% |
| **With vector search** | 76.5% |
| **Specialized systems** | 60-70% |
**Key finding:** Simple approaches can outperform complex memory systems.
---
## Part 5: Other Memory Systems
### 1. LangMem (LangChain)
| Aspect | Details |
|--------|---------|
| **Developer** | LangChain |
| **Focus** | LangChain ecosystem |
| **Integration** | Native LangChain |
**LangMem Features:**
- Memory as LangChain runnable
- Multiple memory types
- Easy chain integration
### 2. Zep
| Aspect | Details |
|--------|---------|
| **Developer** | Zep |
| **Focus** | Research-led |
| **Type** | Memory layer |
**Zep Features:**
- Fact extraction
- Memory summarization
- Temporal awareness
### 3. OpenAI Memory
| Aspect | Details |
|--------|---------|
| **Provider** | OpenAI |
| **Focus** | Simple integration |
| **Type** | Native API |
**OpenAI Memory Limitations:**
- Slow latency (15s+)
- Limited accuracy (53.1%)
- Black box behavior
- No customization
### 4. Cognee
| Aspect | Details |
|--------|---------|
| **Focus** | Knowledge graphs |
| **Innovation** | Graph-based memory |
---
## Part 6: Memory System Comparison
### Feature Matrix
| Feature | Mem0 | Letta | LangMem | Zep | OpenAI |
|---------|------|-------|---------|-----|--------|
| **Self-editing** | ⚠️ | ✅ | ❌ | ⚠️ | ❌ |
| **Graph memory** | ✅ | ❌ | ❌ | ✅ | ❌ |
| **Temporal queries** | ✅ | ⚠️ | ❌ | ✅ | ❌ |
| **Open source** | ✅ | ✅ | ✅ | ✅ | ❌ |
| **Multi-modal** | ✅ | ⚠️ | ❌ | ❌ | ⚠️ |
| **Production-ready** | ✅ | ⚠️ | ⚠️ | ⚠️ | ✅ |
### Performance Comparison
| Metric | Mem0 | Letta | LangMem | OpenAI |
|--------|------|-------|---------|--------|
| **Accuracy** | 66.9% | 74%* | ~58% | 53.1% |
| **Latency** | 1.4s | ~3s | ~2s | 15s+ |
| **Scalability** | High | Medium | Medium | Low |
| **Customization** | High | High | Medium | Low |
### Cost Comparison
| System | Open Source | Managed | Notes |
|--------|-------------|---------|-------|
| **Mem0** | Free | $0.05/1K memories | Vector DB costs extra |
| **Letta** | Free | N/A | Self-host only |
| **LangMem** | Free | N/A | LangChain dependency |
| **Zep** | Free | $50+/mo | Good for teams |
| **OpenAI** | N/A | Per API call | Built into Assistants |
---
## Part 7: Selection Guide
### Decision Tree
### Use Case Matrix
| Use Case | Recommended | Reason |
|----------|-------------|--------|
| **Customer support bot** | Mem0 | Production-ready, personalization |
| **Personal assistant** | Letta | Long-term context, autonomy |
| **Chatbot with history** | LangMem | LangChain integration |
| **Research agent** | Letta | Memory exploration, reasoning |
| **Enterprise deployment** | Mem0 | Scalability, support |
| **Simple FAQ bot** | OpenAI Memory | Easiest, native |
---
## Part 8: Implementation Patterns
### Mem0 Integration
```python
from mem0 import Memory
m = Memory()
# Add memory
m.add("User prefers dark mode", user_id="user123")
# Search memories
results = m.search("interface preferences", user_id="user123")
# Get all memories
all_memories = m.get_all(user_id="user123")
Letta Integration¶
from letta import create_client
client = create_client()
# Create agent with memory
agent = client.create_agent(
name="my_agent",
memory_blocks=["Core memory about user..."]
)
# Send message (agent manages memory automatically)
response = client.send_message(
agent_id=agent.id,
message="Hello!",
role="user"
)
Part 9: Interview-Relevant Numbers¶
LOCOMO Benchmark Scores¶
| System | Single-hop | Multi-hop | Temporal | Overall |
|---|---|---|---|---|
| Mem0 | 85% | 62% | 58% | 66.9% |
| OpenAI | 70% | 45% | 40% | 53.1% |
| Letta (fs) | 88% | 65% | 62% | 74.0% |
Latency Comparison¶
| System | Add Memory | Search | Get All |
|---|---|---|---|
| Mem0 | 0.3s | 0.5s | 0.2s |
| OpenAI | 5-10s | 10-15s | 3-5s |
| LangMem | 0.5s | 1.0s | 0.5s |
Memory Capacity¶
| System | Max Memories | Storage |
|---|---|---|
| Mem0 | Unlimited | Vector DB |
| Letta | Unlimited | Archival |
| OpenAI | 128K tokens | Managed |
Cost per 1K Operations¶
| System | Cost |
|---|---|
| Mem0 (managed) | ~$0.50 |
| Letta (self-host) | Vector DB costs |
| OpenAI | ~$2-5 (API calls) |
Gotchas¶
RAG -- это не память
RAG retrieves знания из документов (статические факты). Память хранит персонализированный контекст: предпочтения пользователя, историю взаимодействий, результаты прошлых действий. RAG отвечает "что такое X", память отвечает "пользователь предпочитает Y". Для полноценного агента нужны оба.
Бенчмарки памяти несравнимы напрямую
Mem0 66.9% на LOCOMO, Letta 74% на filesystem benchmark -- это разные бенчмарки. LOCOMO тестирует single/multi-hop recall в разговорах. Filesystem тестирует file system operations. Сравнивать можно только на одном бенчмарке. Mem0 лидирует по latency (1.4s vs 15s OpenAI), Letta -- по архитектурной гибкости.
Long-term memory = privacy liability
Память агента хранит персональные данные пользователя (предпочтения, факты, историю). Без user isolation один пользователь может получить данные другого. Без TTL и forget mechanisms -- нарушение GDPR/right to be forgotten. Обязательно: per-user memory stores, explicit delete API, data retention policies.
Interview Q&A¶
Q: Какие типы памяти нужны AI-агенту?
Red flag: "Chat history достаточно"
Strong answer: "Четыре типа: (1) Working memory -- текущий turn, ограничена context window. (2) Episodic -- история событий сессии (что агент делал и результаты). (3) Semantic -- долгосрочные факты (предпочтения пользователя, knowledge base). (4) Procedural -- навыки и паттерны (как решать определённые задачи). MemGPT моделирует это через OS-аналогию: working = CPU registers, core = L1 cache, archival = disk."
Q: Сравните Mem0 и MemGPT/Letta.
Strong answer: "Mem0: production-first, API-driven, vector DB + graph store, 66.9% accuracy, 1.4s latency. Лучше для enterprise -- быстрая интеграция, managed service. Letta (MemGPT): agent OS philosophy, иерархическая память (main/core/archival), агент сам управляет что загружать в контекст. Лучше для long-running autonomous agents где агент должен самостоятельно решать что помнить. Key insight: Mem0 = external memory service, Letta = memory-aware agent architecture."
Q: Как оценить качество системы памяти агента?
Strong answer: "Три оси: (1) Accuracy -- single-hop recall (помнит ли факт), multi-hop (связывает ли факты), temporal (помнит ли порядок событий). (2) Latency -- retrieval < 2s для interactive, < 500ms для real-time. (3) Scalability -- как деградирует accuracy с ростом числа memories (100 vs 10K vs 1M). LOCOMO benchmark покрывает все три. На практике ещё важна conflict resolution: что если пользователь сказал 'я живу в Москве' а потом 'я переехал в СПб'."
Sources¶
- Mem0 — "AI Memory Benchmark: OpenAI vs LangMem vs MemGPT vs Mem0"
- Medium — "LLM as Operating Systems: Agent Memory"
- Medium — "AI Memory Wars: Why One System Crushed the Competition"
- Shaped.ai — "The 8 Best Tools for AI Agent Memory (2026 Guide)"
- Hugging Face — "Mem0: Building Production-Ready AI Agents"
- arXiv — "SimpleMem: Efficient Lifelong Memory for LLM Agents" (2601.02553)
- Letta Blog — "Filesystem Benchmark Results"
- GitHub — "Agent-Memory-Paper-List"