Протокол MCP и системы памяти¶
~6 минут чтения
URL: CodiLime, MarkTechPost Тип: memory / mcp / agents Дата: Февраль 2026 Сбор: Ralph Research ФАЗА 5
Предварительно: MCP vs Function Calling, Системы памяти агентов
Зачем это нужно¶
MCP (Model Context Protocol) решает проблему интеграции N моделей с M инструментами: без стандарта нужно N*M коннекторов, с MCP -- N+M. Это JSON-RPC протокол, который унифицирует доступ к ресурсам, промптам и инструментам. Для систем памяти MCP особенно важен: агент получает единый интерфейс к vector DB, graph DB и event log через один протокол, вместо написания отдельных адаптеров для каждого хранилища.
Part 1: Model Context Protocol (MCP) Explained¶
What is MCP?¶
Definition: JSON-RPC-based open standard that enables AI applications to discover and invoke tools uniformly, regardless of provider.
Problem Solved: The N×M Integration Problem
MCP Architecture¶
graph LR
CLIENT["MCP Client<br/>(AI App / LLM)"] <-->|"JSON-RPC"| SERVER["MCP Server<br/>(Data / Tool)"]
SERVER --> R["Resources<br/>(read-only)"]
SERVER --> P["Prompts<br/>(templates)"]
SERVER --> T["Tools<br/>(actions)"]
style CLIENT fill:#e8eaf6,stroke:#3f51b5
style SERVER fill:#e8f5e9,stroke:#4caf50
style R fill:#fff3e0,stroke:#ef6c00
style P fill:#fff3e0,stroke:#ef6c00
style T fill:#fff3e0,stroke:#ef6c00
3 MCP Primitives¶
| Primitive | Purpose | Example |
|---|---|---|
| Resources | Read-only data access | File contents, DB records, API responses |
| Prompts | Pre-defined templates | "Summarize this document", "Analyze code" |
| Tools | Actions with side effects | Execute code, send email, update DB |
Session Lifecycle¶
1. Initialization
Client → Server: initialize request
Server → Client: capabilities + version info
Client → Server: initialized notification
2. Operation
- List resources/prompts/tools
- Read resources
- Call tools
- Get prompt templates
3. Shutdown
Either side can close the session
Authorization¶
Standard: OAuth 2.1 - Dynamic client registration - Token-based authentication - Scope-based permissions
8 MCP Implementation Patterns¶
| Pattern | Description | Use Case |
|---|---|---|
| Prompt Library | Curated prompt templates | Coding assistants, documentation |
| SaaS Wrapper | API → MCP server | Slack, GitHub, Jira integration |
| RAG Context | Vector DB → Resources | Knowledge base search |
| File System | Local files → Resources | Code analysis, document processing |
| Database Connector | SQL/NoSQL → Resources | Data querying, analytics |
| API Gateway | Multiple APIs → Unified MCP | Multi-service orchestration |
| Agent Tool | LLM actions → Tools | Autonomous agents |
| Runtime Environment | Sandboxed execution | Code execution, calculations |
Popular MCP Servers (2026)¶
| Server | Type | Capabilities |
|---|---|---|
| Filesystem | Resources | Local file access |
| PostgreSQL | Resources/Tools | DB queries |
| GitHub | Resources/Tools | Repos, issues, PRs |
| Slack | Tools | Messages, channels |
| Puppeteer | Tools | Web scraping |
| Memory | Resources | Persistent conversation memory |
| Brave Search | Tools | Web search |
Code Example: MCP Client¶
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
async def use_mcp_server():
server_params = StdioServerParameters(
command="python",
args=["-m", "my_mcp_server"],
)
async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
# Initialize
await session.initialize()
# List tools
tools = await session.list_tools()
print(f"Available tools: {[t.name for t in tools.tools]}")
# Call a tool
result = await session.call_tool(
"search_documents",
arguments={"query": "machine learning"}
)
print(result.content)
MCP vs Alternatives¶
| Approach | Integration Cost | Standardization | Tool Discovery |
|---|---|---|---|
| MCP | N+M | Open standard | Automatic |
| LangChain Tools | N×M | Framework-specific | Manual |
| OpenAI Functions | N×M | Provider-specific | Manual |
| Custom API | N×M | None | Manual |
Part 2: LLM Memory Systems Comparison¶
Why Memory Matters for Agents¶
"Memory is not just storage—it's a systems problem with trade-offs across recall, consistency, latency, and cost."
Key Challenges: 1. Long conversations exceed context windows 2. Need to recall relevant past interactions 3. Must maintain consistency over time 4. Balance latency vs accuracy
6 Memory System Patterns (3 Families)¶
Family 1: Vector Memory¶
1. Plain Vector RAG
| Aspect | Value |
|---|---|
| Latency | ~50-100ms |
| Accuracy | Good for exact match |
| Weakness | Poor temporal reasoning, no relationships |
Best For: Simple retrieval, FAQs
2. Tiered Vector Memory (MemGPT / Letta)
graph TD
WM["Working Memory<br/>Current context (fits in window)"] <-->|"Core memory manager<br/>moves items between tiers"| AS["Archive Store<br/>Long-term vector DB"]
style WM fill:#fce4ec,stroke:#c62828
style AS fill:#e8eaf6,stroke:#3f51b5
| Aspect | Value |
|---|---|
| Latency | Working: instant, Archive: 100-200ms |
| Accuracy | Better recall with tiering |
| Innovation | Self-managing memory hierarchy |
Best For: Long conversations, personal assistants
Family 2: Graph Memory¶
3. Temporal Knowledge Graph (Zep/Graphiti)
Message 1 → Entity Extraction → Node Creation
Message 2 → Relation Detection → Edge Creation
↓
Knowledge Graph with Temporal Edges
↓
Query → Graph Traversal + Vector Search
| Aspect | Zep/Graphiti |
|---|---|
| DMR Accuracy | 94.8% vs 93.4% baseline |
| LongMemEval | 18.5% higher accuracy |
| Latency | 150-300ms |
| Innovation | Temporal edges track entity evolution |
Best For: Entity-focused tasks, relationship queries, user modeling
4. Knowledge Graph RAG (GraphRAG)
Documents → Entity/Relation Extraction → Knowledge Graph
↓
Community Detection → Summaries
↓
Query → Graph Traversal + Summaries
| Aspect | Value |
|---|---|
| Strength | Multi-hop reasoning, global summaries |
| Latency | 500ms-2s (complex queries) |
| Weakness | Expensive graph construction |
Best For: Complex reasoning, research synthesis
Family 3: Event Logs¶
5. Execution Logs/Checkpoints (ALAS, LangGraph)
Agent Action → Log Entry (input, output, state)
↓
Checkpoint Store
↓
Failure → Restore from Checkpoint → Retry
| Aspect | Value |
|---|---|
| Purpose | Ground truth, debugging, replay |
| Innovation | Deterministic replay |
| Weakness | Not semantic, just logs |
Best For: Debugging, failure recovery, audit trails
6. Episodic Long-Term Memory
| Aspect | Value |
|---|---|
| Purpose | Cross-task learning |
| Innovation | Generalizes across episodes |
| Weakness | Complex pattern matching |
Best For: Repeated task types, learning from experience
Part 3: Memory System Selection Guide¶
Decision Framework¶
Query Type?
├── Simple retrieval → Plain Vector RAG
├── Long conversations → Tiered Vector (MemGPT)
├── Entity relationships → Temporal KG (Zep)
├── Multi-hop reasoning → GraphRAG
├── Debugging/replay → Execution Logs
└── Pattern reuse → Episodic Memory
Comparison Table¶
| System | Latency | Accuracy | Complexity | Best Use Case |
|---|---|---|---|---|
| Plain Vector RAG | 50-100ms | Good | Low | Simple retrieval |
| Tiered Vector | 0-200ms | Better | Medium | Long conversations |
| Temporal KG (Zep) | 150-300ms | 94.8% | High | Entity-focused |
| GraphRAG | 500ms-2s | Best for multi-hop | Very High | Complex reasoning |
| Execution Logs | 0ms (write) | 100% | Low | Debugging, audit |
| Episodic Memory | Variable | Variable | High | Pattern reuse |
Zep vs Baseline Results¶
| Benchmark | Zep | Baseline | Improvement |
|---|---|---|---|
| DMR (Dialog Memory Recall) | 94.8% | 93.4% | +1.4% |
| LongMemEval | 68.3% | 57.6% | +18.5% |
| LOCOMO | 89.2% | 85.1% | +4.8% |
Part 4: MCP + Memory Integration¶
Pattern: MCP Memory Server¶
# MCP server providing memory capabilities
from mcp.server import Server
server = Server("memory-server")
@server.list_resources()
async def list_memory_resources():
return [
Resource(uri="memory://conversations", name="Conversation History"),
Resource(uri="memory://entities", name="Entity Knowledge Graph"),
Resource(uri="memory://episodes", name="Episodic Memory"),
]
@server.read_resource()
async def read_memory(uri: str):
if uri == "memory://conversations":
return await get_conversation_history()
elif uri == "memory://entities":
return await get_entity_graph()
# ...
@server.call_tool()
async def search_memory(query: str, memory_type: str):
if memory_type == "vector":
return await vector_search(query)
elif memory_type == "graph":
return await graph_search(query)
# ...
Integration Architecture¶
graph TD
AGENT["LLM / Agent"] -->|"MCP Protocol"| MCP["Memory MCP Server"]
MCP --> VDB["Vector DB"]
MCP --> GDB["Graph DB"]
MCP --> EL["Event Log"]
style AGENT fill:#f3e5f5,stroke:#9c27b0
style MCP fill:#e8eaf6,stroke:#3f51b5
style VDB fill:#e8f5e9,stroke:#4caf50
style GDB fill:#e8f5e9,stroke:#4caf50
style EL fill:#e8f5e9,stroke:#4caf50
Part 5: Interview-Relevant Numbers¶
MCP Statistics¶
| Metric | Value |
|---|---|
| Integration reduction | N×M → N+M |
| Protocol | JSON-RPC 2.0 |
| Authorization | OAuth 2.1 |
| Primitives | 3 (Resources, Prompts, Tools) |
Memory System Benchmarks¶
| Metric | Value |
|---|---|
| Zep DMR accuracy | 94.8% |
| Zep LongMemEval improvement | +18.5% |
| Plain Vector RAG latency | 50-100ms |
| GraphRAG latency | 500ms-2s |
| Temporal KG latency | 150-300ms |
Gotchas¶
MCP -- не замена API gateway
MCP стандартизирует discovery и invocation инструментов для LLM, но не заменяет API gateway (rate limiting, auth, routing, load balancing). MCP server -- это адаптер между LLM и конкретным сервисом. Для production нужен MCP + API gateway + observability.
Resources read-only, Tools -- с side effects
Частая ошибка: использовать Tool для чтения данных или Resource для записи. Resources = безопасное чтение (файлы, DB records). Tools = действия с побочными эффектами (отправка email, запись в DB). Смешивание нарушает security model -- read-only операции не должны требовать тех же permissions что write.
OAuth 2.1 обязателен для production MCP
Без авторизации MCP server -- открытая дверь к данным и действиям. Каждый MCP server должен: (1) требовать OAuth 2.1 token, (2) проверять scope permissions для каждого Tool/Resource, (3) логировать все вызовы. Иначе любой клиент может вызвать любой инструмент.
Interview Q&A¶
Q: Какую проблему решает MCP?
Red flag: "MCP -- это новый REST API для AI"
Strong answer: "MCP решает N*M integration problem. Без стандарта: N моделей x M инструментов = N*M коннекторов. С MCP: N+M connections через единый JSON-RPC протокол. Три примитива: Resources (read-only данные), Prompts (шаблоны), Tools (действия с side effects). Аналогия: USB-C для AI -- один стандарт вместо десятков проприетарных разъёмов. Authorization через OAuth 2.1."
Q: Как бы вы реализовали MCP server для системы памяти?
Strong answer: "Три компонента: (1) Resources -- read-only доступ к memories (semantic search, temporal queries, user-specific filtering). (2) Tools -- write operations (add memory, update, delete, consolidate). (3) Backend -- unified interface к vector DB (semantic search), graph DB (relationship queries) и event log (temporal queries). Ключевое: per-user isolation через OAuth scopes, TTL на memories для GDPR compliance, conflict resolution при противоречивых memories."
Sources¶
- CodiLime — "Model Context Protocol (MCP) explained" (Feb 2, 2026)
- MarkTechPost — "Comparing Memory Systems for LLM Agents" (Nov 10, 2025)
- MCP Specification (Anthropic)
- Zep documentation
- MemGPT / Letta documentation (project renamed: github.com/cpacker/MemGPT -> github.com/letta-ai/letta)