AI-агенты: учебные материалы¶

~4 минуты чтения

Предварительно: LLM-агенты | Фреймворки AI-агентов

Рынок AI-агентов вырос с $5B в 2024 до оценочных $47B в 2026 (McKinsey). SWE-Bench показывает 71-84% resolved issues, WebArena -- 62% success rate. На интервью спрашивают ReAct, multi-agent orchestration, memory systems и безопасность tool use. Ниже -- материалы для 2 задач с кодом, архитектурами и сравнениями фреймворков.

Обновлено: 2026-02-11

Обзор задач¶

ID	Задача	Сложность	Ключевые темы
agents_001	ReAct + Multi-Agent	Hard	Reasoning, tool use, orchestration
agents_002	Framework Comparison	Medium	LangChain, LangGraph, CrewAI, AutoGen

1. ReAct Pattern¶

Лучшие источники¶

Papers: - ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al., 2022) - arXiv:2210.03629

Статьи: - AI Agents and Autonomous Systems 2025 — comprehensive guide - The Realistic Guide to AI Agents in 2026 — Decoding AI - AI Agents Mastery Guide 2026 — Level Up

ReAct Pattern Overview¶

graph TD
    T[Thought<br/>Анализ текущего состояния] --> A[Action<br/>Вызов tool / решение]
    A --> O[Observation<br/>Результат действия]
    O -->|Цель не достигнута| T
    O -->|Цель достигнута| F[Finish<br/>Возврат результата]

    style T fill:#e8eaf6,stroke:#3f51b5
    style A fill:#fff3e0,stroke:#ef6c00
    style O fill:#e8f5e9,stroke:#4caf50
    style F fill:#f3e5f5,stroke:#9c27b0

Код: ReAct Agent¶

from typing import List, Callable
import json

class ReActAgent:
    """Simple ReAct agent implementation"""

    def __init__(self, llm, tools: dict[str, Callable]):
        self.llm = llm
        self.tools = tools
        self.history = []

    def run(self, question: str, max_iterations: int = 10):
        self.history = [{"role": "user", "content": question}]

        for i in range(max_iterations):
            # Generate thought and action
            response = self.llm.generate(self._build_prompt())

            # Parse response
            thought, action, action_input = self._parse_response(response)

            if action == "Finish":
                return action_input

            # Execute tool
            if action in self.tools:
                observation = self.tools[action](action_input)
            else:
                observation = f"Unknown tool: {action}"

            # Add to history
            self.history.append({
                "role": "assistant",
                "content": f"Thought: {thought}\nAction: {action}\nAction Input: {action_input}"
            })
            self.history.append({
                "role": "user",
                "content": f"Observation: {observation}"
            })

        return "Max iterations reached"

    def _build_prompt(self):
        return f"""Answer the question using the ReAct pattern.

Available tools: {list(self.tools.keys())}

Format:
Thought: [reasoning]
Action: [tool_name]
Action Input: [input]

When done:
Thought: I have the answer
Action: Finish
Action Input: [final answer]

History: {self.history}
"""

    def _parse_response(self, response):
        # Parse Thought, Action, Action Input from response
        lines = response.strip().split('\n')
        thought = action = action_input = ""
        for line in lines:
            if line.startswith("Thought:"):
                thought = line.replace("Thought:", "").strip()
            elif line.startswith("Action:"):
                action = line.replace("Action:", "").strip()
            elif line.startswith("Action Input:"):
                action_input = line.replace("Action Input:", "").strip()
        return thought, action, action_input

2. Multi-Agent Orchestration¶

Лучшие источники¶

Статьи: - How to Build Multi-Agent Systems: Complete 2026 Guide — Dev.to (Jan 2026) - Agent Orchestration 2026: LangGraph, CrewAI & AutoGen — Iterathon - Design Patterns for Agentic AI — AppsTek (Dec 2025)

Заблуждение: LLM-агент = просто LLM с промптом

Агент имеет 3 компонента: Tools (внешние инструменты), Memory (контекст между сессиями), Planning (планирование действий). Без tools -- это chatbot. Без memory -- каждая сессия с нуля. На интервью ожидают чёткое разделение этих компонентов.

Заблуждение: ReAct всегда лучше простого prompt engineering

ReAct добавляет latency (каждый шаг = вызов LLM) и стоимость (5-10x токенов). Для простых Q&A задач обычный prompt быстрее и дешевле. ReAct нужен когда задача требует внешних инструментов (поиск, вычисления, API) или multi-step reasoning.

Заблуждение: LangChain -- лучший фреймворк для production агентов

LangChain хорош для прототипов, но для production агентов LangGraph (state machines) даёт больше контроля. CrewAI -- для role-based teams. На SWE-Bench лидируют агенты на custom фреймворках, а не на LangChain.

Orchestration Patterns¶

graph LR
    subgraph "Sequential"
        A1[Agent A] --> B1[Agent B] --> C1[Agent C]
    end

    subgraph "Hierarchical"
        M[Manager] --> W1[Worker 1]
        M --> W2[Worker 2]
        M --> W3[Worker 3]
    end

    style A1 fill:#e8eaf6,stroke:#3f51b5
    style B1 fill:#e8eaf6,stroke:#3f51b5
    style C1 fill:#e8eaf6,stroke:#3f51b5
    style M fill:#fff3e0,stroke:#ef6c00
    style W1 fill:#e8f5e9,stroke:#4caf50
    style W2 fill:#e8f5e9,stroke:#4caf50
    style W3 fill:#e8f5e9,stroke:#4caf50

Код: Multi-Agent System¶

from dataclasses import dataclass
from typing import Callable
import asyncio

@dataclass
class Agent:
    name: str
    role: str
    process: Callable

class MultiAgentOrchestrator:
    """Orchestrate multiple agents"""

    def __init__(self, agents: list[Agent]):
        self.agents = {a.name: a for a in agents}
        self.results = {}

    async def run_sequential(self, task: str):
        """Run agents one after another"""
        result = task
        for agent in self.agents.values():
            result = await agent.process(result)
            self.results[agent.name] = result
        return result

    async def run_parallel(self, task: str):
        """Run agents simultaneously"""
        tasks = [agent.process(task) for agent in self.agents.values()]
        results = await asyncio.gather(*tasks)
        return dict(zip(self.agents.keys(), results))

    async def run_hierarchical(self, task: str, manager: str, workers: list[str]):
        """Manager delegates to workers"""
        # Manager analyzes and creates subtasks
        subtasks = await self.agents[manager].process(task)

        # Workers execute in parallel
        worker_tasks = [
            self.agents[w].process(subtasks[i])
            for i, w in enumerate(workers)
        ]
        worker_results = await asyncio.gather(*worker_tasks)

        # Manager combines results
        return await self.agents[manager].process(worker_results)

3. Framework Comparison¶

LangChain vs LangGraph vs CrewAI¶

Feature	LangChain	LangGraph	CrewAI
Primary Use	Simple chains	State machines	Role-based teams
Complexity	Low	Medium	Medium
Control Flow	Linear	Graph-based	Hierarchical
Best For	Prototyping	Production agents	Multi-agent teams

LangGraph Example¶

from langgraph.graph import StateGraph, END
from typing import TypedDict

class AgentState(TypedDict):
    messages: list
    next_action: str

def create_agent_graph():
    graph = StateGraph(AgentState)

    # Add nodes
    graph.add_node("reason", reason_node)
    graph.add_node("act", act_node)
    graph.add_node("observe", observe_node)

    # Add edges
    graph.add_edge("reason", "act")
    graph.add_edge("act", "observe")
    graph.add_conditional_edges(
        "observe",
        should_continue,
        {True: "reason", False: END}
    )

    graph.set_entry_point("reason")
    return graph.compile()

4. Memory Systems (2026 Gap Filler)¶

Источники¶

AI Agent Memory Systems: Architecture and Innovations — SparkCo (Oct 2025)
Build smarter AI agents: Manage short-term and long-term memory with Redis — Redis (Apr 2025)
Memory Engineering for AI Agents — Medium

Типы памяти AI агентов¶

1. Short-term Memory (STM)
   - Conversation history
   - Current task context
   - Working memory (в рамках сессии)

2. Long-term Memory (LTM)
   - Vector database storage
   - Persistent user preferences
   - Past experiences (episodic)

3. Episodic Memory
   - Past task completions
   - Success/failure patterns
   - User feedback history

Код: Memory System¶

from dataclasses import dataclass
from datetime import datetime
from typing import Optional
import json

@dataclass
class Memory:
    """Agent memory system"""
    short_term: list[dict]  # Conversation history
    long_term: dict         # Vector DB reference
    episodic: list[dict]    # Past experiences

class MemoryManager:
    """Manage agent memory across sessions"""

    def __init__(self, vector_db, max_short_term: int = 10):
        self.vector_db = vector_db
        self.max_short_term = max_short_term

    def add_to_short_term(self, memory: Memory, message: dict):
        """Add message to short-term memory"""
        memory.short_term.append(message)
        if len(memory.short_term) > self.max_short_term:
            # Summarize old messages
            summary = self._summarize(memory.short_term[:-self.max_short_term])
            memory.short_term = [summary] + memory.short_term[-self.max_short_term:]

    def store_long_term(self, memory: Memory, text: str, metadata: dict):
        """Store in vector database for retrieval"""
        embedding = self.vector_db.embed(text)
        self.vector_db.insert(embedding, metadata)

    def retrieve_relevant(self, memory: Memory, query: str, k: int = 5):
        """Retrieve relevant memories from long-term storage"""
        embedding = self.vector_db.embed(query)
        return self.vector_db.search(embedding, k)

    def record_episode(self, memory: Memory, task: str, outcome: str, success: bool):
        """Record episodic memory"""
        memory.episodic.append({
            "task": task,
            "outcome": outcome,
            "success": success,
            "timestamp": datetime.now().isoformat()
        })

5. Agent Evaluation (2026 Gap Filler)¶

Источники¶

Best AI Agent Evaluation Benchmarks: 2025 Complete Guide — O-Mega (Oct 2025)
Guide to AI Agent Performance Metrics — Newline
State of AI Agents 2026 — Lovelytics (Feb 2026)

Key Benchmarks 2025-2026¶

Benchmark	Domain	What Tests	Best Score
WebArena	Web browsing	Multi-step web tasks	~62% (IBM CUGA)
OSWorld	Desktop OS	File operations, apps	~72% human
Mind2Web	Live websites	Real-world web tasks	~23% (GPT-4)
SWE-Bench	Coding	Bug fixes	71-84%
BrowseComp	Web navigation	Information retrieval	60%+

Key Metrics¶

Task Success Metrics:
- Task completion rate
- Step accuracy
- Time to completion
- Cost per task

Quality Metrics:
- Correctness of final answer
- Trajectory efficiency (optimal path?)
- Error rate per step
- Recovery rate from errors

Cost Metrics:
- Token usage
- API calls
- Latency
- $/task

Safety Metrics:
- Policy violations
- Harmful actions
- Data leakage incidents

Код: Agent Evaluator¶

from dataclasses import dataclass
from typing import Callable
import time

@dataclass
class EvalResult:
    success: bool
    steps_taken: int
    tokens_used: int
    latency_ms: float
    cost_usd: float
    errors: list[str]

class AgentEvaluator:
    """Evaluate agent performance"""

    def __init__(self, cost_per_1k_tokens: float = 0.01):
        self.cost_per_1k_tokens = cost_per_1k_tokens

    def evaluate(
        self,
        agent,
        task: str,
        ground_truth: Callable,
        max_steps: int = 50
    ) -> EvalResult:
        start_time = time.time()
        tokens_used = 0
        errors = []

        # Run agent
        result = agent.run(task, max_iterations=max_steps)

        # Collect metrics
        latency_ms = (time.time() - start_time) * 1000
        tokens_used = getattr(agent, 'total_tokens', 0)
        cost_usd = (tokens_used / 1000) * self.cost_per_1k_tokens

        # Check success
        success = ground_truth(result)

        return EvalResult(
            success=success,
            steps_taken=len(agent.history) if hasattr(agent, 'history') else 0,
            tokens_used=tokens_used,
            latency_ms=latency_ms,
            cost_usd=cost_usd,
            errors=errors
        )

    def benchmark(self, agent, tasks: list[dict]) -> dict:
        """Run benchmark suite"""
        results = []
        for task in tasks:
            result = self.evaluate(agent, task['prompt'], task['validator'])
            results.append(result)

        return {
            "success_rate": sum(r.success for r in results) / len(results),
            "avg_steps": sum(r.steps_taken for r in results) / len(results),
            "avg_latency_ms": sum(r.latency_ms for r in results) / len(results),
            "total_cost_usd": sum(r.cost_usd for r in results)
        }

6. Tool Use Safety (2026 Gap Filler)¶

Источники¶

Security for Production AI Agents in 2026 — Iain (Feb 2026)
OWASP Top 10 for LLM Applications — OWASP
Building Secure AI Agents — Anthropic

Defence-in-Depth Architecture¶

Layer 1: Input Validation
├── Schema validation
├── Content moderation
├── Rate limiting
└── Authentication

Layer 2: Deterministic Guardrails
├── Tool allowlists
├── Parameter constraints
├── SQL sanitization
└── Path traversal prevention

Layer 3: Model-Level Security
├── Constitutional AI
├── Fine-tuned rejection
├── Hidden chain-of-thought
└── Structured outputs

Layer 4: Human-in-the-Loop
├── Approval workflows
├── Audit logging
├── Rollback capabilities
└── Escalation paths

Layer 5: LLM-as-Judge
├── Output validation
├── Harm detection
├── Policy compliance
└── Quality scoring

Key Threats (OWASP LLM Top 10)¶

Threat	Description	Mitigation
LLM01	Prompt Injection	Input/output filtering, separation
LLM02	Insecure Output	Output validation, encoding
LLM03	Training Data Poisoning	Data provenance, validation
LLM04	Model DoS	Rate limiting, resource caps
LLM05	Supply Chain	Dependency scanning
LLM06	Sensitive Info Disclosure	PII detection, redaction
LLM07	Insecure Plugin	Tool allowlists, sandboxing
LLM08	Excessive Agency	Permission scoping
LLM09	Overreliance	Confidence thresholds
LLM10	Model Theft	Access controls, monitoring

Код: Security Layer¶

from dataclasses import dataclass
from typing import Callable, Any
import re

@dataclass
class ToolCall:
    name: str
    params: dict

class AgentSecurityLayer:
    """Defence-in-depth security for AI agents"""

    def __init__(self, allowed_tools: set[str], sensitive_patterns: list[str]):
        self.allowed_tools = allowed_tools
        self.sensitive_patterns = [re.compile(p) for p in sensitive_patterns]

    def validate_input(self, user_input: str) -> tuple[bool, str]:
        """Layer 1: Input validation"""
        # Check for injection patterns
        for pattern in self.sensitive_patterns:
            if pattern.search(user_input):
                return False, f"Injection pattern detected"

        # Length check
        if len(user_input) > 10000:
            return False, "Input too long"

        return True, user_input

    def validate_tool_call(self, call: ToolCall) -> tuple[bool, str]:
        """Layer 2: Deterministic guardrails"""
        # Tool allowlist
        if call.name not in self.allowed_tools:
            return False, f"Tool '{call.name}' not in allowlist"

        # Parameter validation
        if call.name == "execute_sql":
            # Block dangerous SQL
            dangerous = ["DROP", "DELETE", "TRUNCATE", "ALTER"]
            sql = call.params.get("query", "").upper()
            for d in dangerous:
                if d in sql:
                    return False, f"Blocked dangerous SQL: {d}"

        if call.name == "read_file":
            # Path traversal prevention
            path = call.params.get("path", "")
            if ".." in path or path.startswith("/etc"):
                return False, "Path traversal blocked"

        return True, "OK"

    def validate_output(self, output: str) -> tuple[bool, str]:
        """Layer 5: LLM-as-Judge (simplified)"""
        # Check for PII leakage
        pii_patterns = [
            r'\b\d{16}\b',  # Credit card
            r'\b\d{3}-\d{2}-\d{4}\b',  # SSN
            r'\b[A-Z]{2}\d{9}\b',  # Passport
        ]
        for pattern in pii_patterns:
            if re.search(pattern, output):
                return False, "PII detected in output"

        return True, output

    def with_human_approval(
        self,
        call: ToolCall,
        risk_level: str = "medium"
    ) -> bool:
        """Layer 4: Human-in-the-Loop"""
        HIGH_RISK_TOOLS = {"execute_sql", "send_email", "delete_file"}

        if call.name in HIGH_RISK_TOOLS or risk_level == "high":
            # In production: send to approval queue
            # Return True only after human approval
            return False  # Block until approved

        return True

Observability¶

# OpenTelemetry for agent tracing
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider

tracer = trace.get_tracer(__name__)

async def traced_agent_run(agent, task: str):
    with tracer.start_as_current_span("agent.run") as span:
        span.set_attribute("task", task)
        span.set_attribute("agent.name", agent.name)

        try:
            result = await agent.run(task)
            span.set_attribute("result.success", True)
            return result
        except Exception as e:
            span.set_attribute("result.success", False)
            span.record_exception(e)
            raise

Видео-ресурсы¶

LangChain Academy — free courses
DeepLearning.AI — AI Agents courses
Andrew Ng — Agentic workflows