AI-агенты: учебные материалы¶
~4 минуты чтения
Предварительно: LLM-агенты | Фреймворки AI-агентов
Рынок AI-агентов вырос с $5B в 2024 до оценочных $47B в 2026 (McKinsey). SWE-Bench показывает 71-84% resolved issues, WebArena -- 62% success rate. На интервью спрашивают ReAct, multi-agent orchestration, memory systems и безопасность tool use. Ниже -- материалы для 2 задач с кодом, архитектурами и сравнениями фреймворков.
Обновлено: 2026-02-11
Обзор задач¶
| ID | Задача | Сложность | Ключевые темы |
|---|---|---|---|
| agents_001 | ReAct + Multi-Agent | Hard | Reasoning, tool use, orchestration |
| agents_002 | Framework Comparison | Medium | LangChain, LangGraph, CrewAI, AutoGen |
1. ReAct Pattern¶
Лучшие источники¶
Papers: - ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al., 2022) - arXiv:2210.03629
Статьи: - AI Agents and Autonomous Systems 2025 — comprehensive guide - The Realistic Guide to AI Agents in 2026 — Decoding AI - AI Agents Mastery Guide 2026 — Level Up
ReAct Pattern Overview¶
graph TD
T[Thought<br/>Анализ текущего состояния] --> A[Action<br/>Вызов tool / решение]
A --> O[Observation<br/>Результат действия]
O -->|Цель не достигнута| T
O -->|Цель достигнута| F[Finish<br/>Возврат результата]
style T fill:#e8eaf6,stroke:#3f51b5
style A fill:#fff3e0,stroke:#ef6c00
style O fill:#e8f5e9,stroke:#4caf50
style F fill:#f3e5f5,stroke:#9c27b0
Код: ReAct Agent¶
from typing import List, Callable
import json
class ReActAgent:
"""Simple ReAct agent implementation"""
def __init__(self, llm, tools: dict[str, Callable]):
self.llm = llm
self.tools = tools
self.history = []
def run(self, question: str, max_iterations: int = 10):
self.history = [{"role": "user", "content": question}]
for i in range(max_iterations):
# Generate thought and action
response = self.llm.generate(self._build_prompt())
# Parse response
thought, action, action_input = self._parse_response(response)
if action == "Finish":
return action_input
# Execute tool
if action in self.tools:
observation = self.tools[action](action_input)
else:
observation = f"Unknown tool: {action}"
# Add to history
self.history.append({
"role": "assistant",
"content": f"Thought: {thought}\nAction: {action}\nAction Input: {action_input}"
})
self.history.append({
"role": "user",
"content": f"Observation: {observation}"
})
return "Max iterations reached"
def _build_prompt(self):
return f"""Answer the question using the ReAct pattern.
Available tools: {list(self.tools.keys())}
Format:
Thought: [reasoning]
Action: [tool_name]
Action Input: [input]
When done:
Thought: I have the answer
Action: Finish
Action Input: [final answer]
History: {self.history}
"""
def _parse_response(self, response):
# Parse Thought, Action, Action Input from response
lines = response.strip().split('\n')
thought = action = action_input = ""
for line in lines:
if line.startswith("Thought:"):
thought = line.replace("Thought:", "").strip()
elif line.startswith("Action:"):
action = line.replace("Action:", "").strip()
elif line.startswith("Action Input:"):
action_input = line.replace("Action Input:", "").strip()
return thought, action, action_input
2. Multi-Agent Orchestration¶
Лучшие источники¶
Статьи: - How to Build Multi-Agent Systems: Complete 2026 Guide — Dev.to (Jan 2026) - Agent Orchestration 2026: LangGraph, CrewAI & AutoGen — Iterathon - Design Patterns for Agentic AI — AppsTek (Dec 2025)
Заблуждение: LLM-агент = просто LLM с промптом
Агент имеет 3 компонента: Tools (внешние инструменты), Memory (контекст между сессиями), Planning (планирование действий). Без tools -- это chatbot. Без memory -- каждая сессия с нуля. На интервью ожидают чёткое разделение этих компонентов.
Заблуждение: ReAct всегда лучше простого prompt engineering
ReAct добавляет latency (каждый шаг = вызов LLM) и стоимость (5-10x токенов). Для простых Q&A задач обычный prompt быстрее и дешевле. ReAct нужен когда задача требует внешних инструментов (поиск, вычисления, API) или multi-step reasoning.
Заблуждение: LangChain -- лучший фреймворк для production агентов
LangChain хорош для прототипов, но для production агентов LangGraph (state machines) даёт больше контроля. CrewAI -- для role-based teams. На SWE-Bench лидируют агенты на custom фреймворках, а не на LangChain.
Orchestration Patterns¶
graph LR
subgraph "Sequential"
A1[Agent A] --> B1[Agent B] --> C1[Agent C]
end
subgraph "Hierarchical"
M[Manager] --> W1[Worker 1]
M --> W2[Worker 2]
M --> W3[Worker 3]
end
style A1 fill:#e8eaf6,stroke:#3f51b5
style B1 fill:#e8eaf6,stroke:#3f51b5
style C1 fill:#e8eaf6,stroke:#3f51b5
style M fill:#fff3e0,stroke:#ef6c00
style W1 fill:#e8f5e9,stroke:#4caf50
style W2 fill:#e8f5e9,stroke:#4caf50
style W3 fill:#e8f5e9,stroke:#4caf50
Код: Multi-Agent System¶
from dataclasses import dataclass
from typing import Callable
import asyncio
@dataclass
class Agent:
name: str
role: str
process: Callable
class MultiAgentOrchestrator:
"""Orchestrate multiple agents"""
def __init__(self, agents: list[Agent]):
self.agents = {a.name: a for a in agents}
self.results = {}
async def run_sequential(self, task: str):
"""Run agents one after another"""
result = task
for agent in self.agents.values():
result = await agent.process(result)
self.results[agent.name] = result
return result
async def run_parallel(self, task: str):
"""Run agents simultaneously"""
tasks = [agent.process(task) for agent in self.agents.values()]
results = await asyncio.gather(*tasks)
return dict(zip(self.agents.keys(), results))
async def run_hierarchical(self, task: str, manager: str, workers: list[str]):
"""Manager delegates to workers"""
# Manager analyzes and creates subtasks
subtasks = await self.agents[manager].process(task)
# Workers execute in parallel
worker_tasks = [
self.agents[w].process(subtasks[i])
for i, w in enumerate(workers)
]
worker_results = await asyncio.gather(*worker_tasks)
# Manager combines results
return await self.agents[manager].process(worker_results)
3. Framework Comparison¶
LangChain vs LangGraph vs CrewAI¶
| Feature | LangChain | LangGraph | CrewAI |
|---|---|---|---|
| Primary Use | Simple chains | State machines | Role-based teams |
| Complexity | Low | Medium | Medium |
| Control Flow | Linear | Graph-based | Hierarchical |
| Best For | Prototyping | Production agents | Multi-agent teams |
LangGraph Example¶
from langgraph.graph import StateGraph, END
from typing import TypedDict
class AgentState(TypedDict):
messages: list
next_action: str
def create_agent_graph():
graph = StateGraph(AgentState)
# Add nodes
graph.add_node("reason", reason_node)
graph.add_node("act", act_node)
graph.add_node("observe", observe_node)
# Add edges
graph.add_edge("reason", "act")
graph.add_edge("act", "observe")
graph.add_conditional_edges(
"observe",
should_continue,
{True: "reason", False: END}
)
graph.set_entry_point("reason")
return graph.compile()
4. Memory Systems (2026 Gap Filler)¶
Источники¶
- AI Agent Memory Systems: Architecture and Innovations — SparkCo (Oct 2025)
- Build smarter AI agents: Manage short-term and long-term memory with Redis — Redis (Apr 2025)
- Memory Engineering for AI Agents — Medium
Типы памяти AI агентов¶
1. Short-term Memory (STM)
- Conversation history
- Current task context
- Working memory (в рамках сессии)
2. Long-term Memory (LTM)
- Vector database storage
- Persistent user preferences
- Past experiences (episodic)
3. Episodic Memory
- Past task completions
- Success/failure patterns
- User feedback history
Код: Memory System¶
from dataclasses import dataclass
from datetime import datetime
from typing import Optional
import json
@dataclass
class Memory:
"""Agent memory system"""
short_term: list[dict] # Conversation history
long_term: dict # Vector DB reference
episodic: list[dict] # Past experiences
class MemoryManager:
"""Manage agent memory across sessions"""
def __init__(self, vector_db, max_short_term: int = 10):
self.vector_db = vector_db
self.max_short_term = max_short_term
def add_to_short_term(self, memory: Memory, message: dict):
"""Add message to short-term memory"""
memory.short_term.append(message)
if len(memory.short_term) > self.max_short_term:
# Summarize old messages
summary = self._summarize(memory.short_term[:-self.max_short_term])
memory.short_term = [summary] + memory.short_term[-self.max_short_term:]
def store_long_term(self, memory: Memory, text: str, metadata: dict):
"""Store in vector database for retrieval"""
embedding = self.vector_db.embed(text)
self.vector_db.insert(embedding, metadata)
def retrieve_relevant(self, memory: Memory, query: str, k: int = 5):
"""Retrieve relevant memories from long-term storage"""
embedding = self.vector_db.embed(query)
return self.vector_db.search(embedding, k)
def record_episode(self, memory: Memory, task: str, outcome: str, success: bool):
"""Record episodic memory"""
memory.episodic.append({
"task": task,
"outcome": outcome,
"success": success,
"timestamp": datetime.now().isoformat()
})
5. Agent Evaluation (2026 Gap Filler)¶
Источники¶
- Best AI Agent Evaluation Benchmarks: 2025 Complete Guide — O-Mega (Oct 2025)
- Guide to AI Agent Performance Metrics — Newline
- State of AI Agents 2026 — Lovelytics (Feb 2026)
Key Benchmarks 2025-2026¶
| Benchmark | Domain | What Tests | Best Score |
|---|---|---|---|
| WebArena | Web browsing | Multi-step web tasks | ~62% (IBM CUGA) |
| OSWorld | Desktop OS | File operations, apps | ~72% human |
| Mind2Web | Live websites | Real-world web tasks | ~23% (GPT-4) |
| SWE-Bench | Coding | Bug fixes | 71-84% |
| BrowseComp | Web navigation | Information retrieval | 60%+ |
Key Metrics¶
Task Success Metrics:
- Task completion rate
- Step accuracy
- Time to completion
- Cost per task
Quality Metrics:
- Correctness of final answer
- Trajectory efficiency (optimal path?)
- Error rate per step
- Recovery rate from errors
Cost Metrics:
- Token usage
- API calls
- Latency
- $/task
Safety Metrics:
- Policy violations
- Harmful actions
- Data leakage incidents
Код: Agent Evaluator¶
from dataclasses import dataclass
from typing import Callable
import time
@dataclass
class EvalResult:
success: bool
steps_taken: int
tokens_used: int
latency_ms: float
cost_usd: float
errors: list[str]
class AgentEvaluator:
"""Evaluate agent performance"""
def __init__(self, cost_per_1k_tokens: float = 0.01):
self.cost_per_1k_tokens = cost_per_1k_tokens
def evaluate(
self,
agent,
task: str,
ground_truth: Callable,
max_steps: int = 50
) -> EvalResult:
start_time = time.time()
tokens_used = 0
errors = []
# Run agent
result = agent.run(task, max_iterations=max_steps)
# Collect metrics
latency_ms = (time.time() - start_time) * 1000
tokens_used = getattr(agent, 'total_tokens', 0)
cost_usd = (tokens_used / 1000) * self.cost_per_1k_tokens
# Check success
success = ground_truth(result)
return EvalResult(
success=success,
steps_taken=len(agent.history) if hasattr(agent, 'history') else 0,
tokens_used=tokens_used,
latency_ms=latency_ms,
cost_usd=cost_usd,
errors=errors
)
def benchmark(self, agent, tasks: list[dict]) -> dict:
"""Run benchmark suite"""
results = []
for task in tasks:
result = self.evaluate(agent, task['prompt'], task['validator'])
results.append(result)
return {
"success_rate": sum(r.success for r in results) / len(results),
"avg_steps": sum(r.steps_taken for r in results) / len(results),
"avg_latency_ms": sum(r.latency_ms for r in results) / len(results),
"total_cost_usd": sum(r.cost_usd for r in results)
}
6. Tool Use Safety (2026 Gap Filler)¶
Источники¶
- Security for Production AI Agents in 2026 — Iain (Feb 2026)
- OWASP Top 10 for LLM Applications — OWASP
- Building Secure AI Agents — Anthropic
Defence-in-Depth Architecture¶
Layer 1: Input Validation
├── Schema validation
├── Content moderation
├── Rate limiting
└── Authentication
Layer 2: Deterministic Guardrails
├── Tool allowlists
├── Parameter constraints
├── SQL sanitization
└── Path traversal prevention
Layer 3: Model-Level Security
├── Constitutional AI
├── Fine-tuned rejection
├── Hidden chain-of-thought
└── Structured outputs
Layer 4: Human-in-the-Loop
├── Approval workflows
├── Audit logging
├── Rollback capabilities
└── Escalation paths
Layer 5: LLM-as-Judge
├── Output validation
├── Harm detection
├── Policy compliance
└── Quality scoring
Key Threats (OWASP LLM Top 10)¶
| Threat | Description | Mitigation |
|---|---|---|
| LLM01 | Prompt Injection | Input/output filtering, separation |
| LLM02 | Insecure Output | Output validation, encoding |
| LLM03 | Training Data Poisoning | Data provenance, validation |
| LLM04 | Model DoS | Rate limiting, resource caps |
| LLM05 | Supply Chain | Dependency scanning |
| LLM06 | Sensitive Info Disclosure | PII detection, redaction |
| LLM07 | Insecure Plugin | Tool allowlists, sandboxing |
| LLM08 | Excessive Agency | Permission scoping |
| LLM09 | Overreliance | Confidence thresholds |
| LLM10 | Model Theft | Access controls, monitoring |
Код: Security Layer¶
from dataclasses import dataclass
from typing import Callable, Any
import re
@dataclass
class ToolCall:
name: str
params: dict
class AgentSecurityLayer:
"""Defence-in-depth security for AI agents"""
def __init__(self, allowed_tools: set[str], sensitive_patterns: list[str]):
self.allowed_tools = allowed_tools
self.sensitive_patterns = [re.compile(p) for p in sensitive_patterns]
def validate_input(self, user_input: str) -> tuple[bool, str]:
"""Layer 1: Input validation"""
# Check for injection patterns
for pattern in self.sensitive_patterns:
if pattern.search(user_input):
return False, f"Injection pattern detected"
# Length check
if len(user_input) > 10000:
return False, "Input too long"
return True, user_input
def validate_tool_call(self, call: ToolCall) -> tuple[bool, str]:
"""Layer 2: Deterministic guardrails"""
# Tool allowlist
if call.name not in self.allowed_tools:
return False, f"Tool '{call.name}' not in allowlist"
# Parameter validation
if call.name == "execute_sql":
# Block dangerous SQL
dangerous = ["DROP", "DELETE", "TRUNCATE", "ALTER"]
sql = call.params.get("query", "").upper()
for d in dangerous:
if d in sql:
return False, f"Blocked dangerous SQL: {d}"
if call.name == "read_file":
# Path traversal prevention
path = call.params.get("path", "")
if ".." in path or path.startswith("/etc"):
return False, "Path traversal blocked"
return True, "OK"
def validate_output(self, output: str) -> tuple[bool, str]:
"""Layer 5: LLM-as-Judge (simplified)"""
# Check for PII leakage
pii_patterns = [
r'\b\d{16}\b', # Credit card
r'\b\d{3}-\d{2}-\d{4}\b', # SSN
r'\b[A-Z]{2}\d{9}\b', # Passport
]
for pattern in pii_patterns:
if re.search(pattern, output):
return False, "PII detected in output"
return True, output
def with_human_approval(
self,
call: ToolCall,
risk_level: str = "medium"
) -> bool:
"""Layer 4: Human-in-the-Loop"""
HIGH_RISK_TOOLS = {"execute_sql", "send_email", "delete_file"}
if call.name in HIGH_RISK_TOOLS or risk_level == "high":
# In production: send to approval queue
# Return True only after human approval
return False # Block until approved
return True
Observability¶
# OpenTelemetry for agent tracing
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
tracer = trace.get_tracer(__name__)
async def traced_agent_run(agent, task: str):
with tracer.start_as_current_span("agent.run") as span:
span.set_attribute("task", task)
span.set_attribute("agent.name", agent.name)
try:
result = await agent.run(task)
span.set_attribute("result.success", True)
return result
except Exception as e:
span.set_attribute("result.success", False)
span.record_exception(e)
raise
Видео-ресурсы¶
- LangChain Academy — free courses
- DeepLearning.AI — AI Agents courses
- Andrew Ng — Agentic workflows