Перейти к содержанию

AI-агенты: учебные материалы

~4 минуты чтения

Предварительно: LLM-агенты | Фреймворки AI-агентов

Рынок AI-агентов вырос с $5B в 2024 до оценочных $47B в 2026 (McKinsey). SWE-Bench показывает 71-84% resolved issues, WebArena -- 62% success rate. На интервью спрашивают ReAct, multi-agent orchestration, memory systems и безопасность tool use. Ниже -- материалы для 2 задач с кодом, архитектурами и сравнениями фреймворков.

Обновлено: 2026-02-11


Обзор задач

ID Задача Сложность Ключевые темы
agents_001 ReAct + Multi-Agent Hard Reasoning, tool use, orchestration
agents_002 Framework Comparison Medium LangChain, LangGraph, CrewAI, AutoGen

1. ReAct Pattern

Лучшие источники

Papers: - ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al., 2022) - arXiv:2210.03629

Статьи: - AI Agents and Autonomous Systems 2025 — comprehensive guide - The Realistic Guide to AI Agents in 2026 — Decoding AI - AI Agents Mastery Guide 2026 — Level Up

ReAct Pattern Overview

graph TD
    T[Thought<br/>Анализ текущего состояния] --> A[Action<br/>Вызов tool / решение]
    A --> O[Observation<br/>Результат действия]
    O -->|Цель не достигнута| T
    O -->|Цель достигнута| F[Finish<br/>Возврат результата]

    style T fill:#e8eaf6,stroke:#3f51b5
    style A fill:#fff3e0,stroke:#ef6c00
    style O fill:#e8f5e9,stroke:#4caf50
    style F fill:#f3e5f5,stroke:#9c27b0

Код: ReAct Agent

from typing import List, Callable
import json

class ReActAgent:
    """Simple ReAct agent implementation"""

    def __init__(self, llm, tools: dict[str, Callable]):
        self.llm = llm
        self.tools = tools
        self.history = []

    def run(self, question: str, max_iterations: int = 10):
        self.history = [{"role": "user", "content": question}]

        for i in range(max_iterations):
            # Generate thought and action
            response = self.llm.generate(self._build_prompt())

            # Parse response
            thought, action, action_input = self._parse_response(response)

            if action == "Finish":
                return action_input

            # Execute tool
            if action in self.tools:
                observation = self.tools[action](action_input)
            else:
                observation = f"Unknown tool: {action}"

            # Add to history
            self.history.append({
                "role": "assistant",
                "content": f"Thought: {thought}\nAction: {action}\nAction Input: {action_input}"
            })
            self.history.append({
                "role": "user",
                "content": f"Observation: {observation}"
            })

        return "Max iterations reached"

    def _build_prompt(self):
        return f"""Answer the question using the ReAct pattern.

Available tools: {list(self.tools.keys())}

Format:
Thought: [reasoning]
Action: [tool_name]
Action Input: [input]

When done:
Thought: I have the answer
Action: Finish
Action Input: [final answer]

History: {self.history}
"""

    def _parse_response(self, response):
        # Parse Thought, Action, Action Input from response
        lines = response.strip().split('\n')
        thought = action = action_input = ""
        for line in lines:
            if line.startswith("Thought:"):
                thought = line.replace("Thought:", "").strip()
            elif line.startswith("Action:"):
                action = line.replace("Action:", "").strip()
            elif line.startswith("Action Input:"):
                action_input = line.replace("Action Input:", "").strip()
        return thought, action, action_input

2. Multi-Agent Orchestration

Лучшие источники

Статьи: - How to Build Multi-Agent Systems: Complete 2026 Guide — Dev.to (Jan 2026) - Agent Orchestration 2026: LangGraph, CrewAI & AutoGen — Iterathon - Design Patterns for Agentic AI — AppsTek (Dec 2025)

Заблуждение: LLM-агент = просто LLM с промптом

Агент имеет 3 компонента: Tools (внешние инструменты), Memory (контекст между сессиями), Planning (планирование действий). Без tools -- это chatbot. Без memory -- каждая сессия с нуля. На интервью ожидают чёткое разделение этих компонентов.

Заблуждение: ReAct всегда лучше простого prompt engineering

ReAct добавляет latency (каждый шаг = вызов LLM) и стоимость (5-10x токенов). Для простых Q&A задач обычный prompt быстрее и дешевле. ReAct нужен когда задача требует внешних инструментов (поиск, вычисления, API) или multi-step reasoning.

Заблуждение: LangChain -- лучший фреймворк для production агентов

LangChain хорош для прототипов, но для production агентов LangGraph (state machines) даёт больше контроля. CrewAI -- для role-based teams. На SWE-Bench лидируют агенты на custom фреймворках, а не на LangChain.

Orchestration Patterns

graph LR
    subgraph "Sequential"
        A1[Agent A] --> B1[Agent B] --> C1[Agent C]
    end

    subgraph "Hierarchical"
        M[Manager] --> W1[Worker 1]
        M --> W2[Worker 2]
        M --> W3[Worker 3]
    end

    style A1 fill:#e8eaf6,stroke:#3f51b5
    style B1 fill:#e8eaf6,stroke:#3f51b5
    style C1 fill:#e8eaf6,stroke:#3f51b5
    style M fill:#fff3e0,stroke:#ef6c00
    style W1 fill:#e8f5e9,stroke:#4caf50
    style W2 fill:#e8f5e9,stroke:#4caf50
    style W3 fill:#e8f5e9,stroke:#4caf50

Код: Multi-Agent System

from dataclasses import dataclass
from typing import Callable
import asyncio

@dataclass
class Agent:
    name: str
    role: str
    process: Callable

class MultiAgentOrchestrator:
    """Orchestrate multiple agents"""

    def __init__(self, agents: list[Agent]):
        self.agents = {a.name: a for a in agents}
        self.results = {}

    async def run_sequential(self, task: str):
        """Run agents one after another"""
        result = task
        for agent in self.agents.values():
            result = await agent.process(result)
            self.results[agent.name] = result
        return result

    async def run_parallel(self, task: str):
        """Run agents simultaneously"""
        tasks = [agent.process(task) for agent in self.agents.values()]
        results = await asyncio.gather(*tasks)
        return dict(zip(self.agents.keys(), results))

    async def run_hierarchical(self, task: str, manager: str, workers: list[str]):
        """Manager delegates to workers"""
        # Manager analyzes and creates subtasks
        subtasks = await self.agents[manager].process(task)

        # Workers execute in parallel
        worker_tasks = [
            self.agents[w].process(subtasks[i])
            for i, w in enumerate(workers)
        ]
        worker_results = await asyncio.gather(*worker_tasks)

        # Manager combines results
        return await self.agents[manager].process(worker_results)

3. Framework Comparison

LangChain vs LangGraph vs CrewAI

Feature LangChain LangGraph CrewAI
Primary Use Simple chains State machines Role-based teams
Complexity Low Medium Medium
Control Flow Linear Graph-based Hierarchical
Best For Prototyping Production agents Multi-agent teams

LangGraph Example

from langgraph.graph import StateGraph, END
from typing import TypedDict

class AgentState(TypedDict):
    messages: list
    next_action: str

def create_agent_graph():
    graph = StateGraph(AgentState)

    # Add nodes
    graph.add_node("reason", reason_node)
    graph.add_node("act", act_node)
    graph.add_node("observe", observe_node)

    # Add edges
    graph.add_edge("reason", "act")
    graph.add_edge("act", "observe")
    graph.add_conditional_edges(
        "observe",
        should_continue,
        {True: "reason", False: END}
    )

    graph.set_entry_point("reason")
    return graph.compile()

4. Memory Systems (2026 Gap Filler)

Источники

Типы памяти AI агентов

1. Short-term Memory (STM)
   - Conversation history
   - Current task context
   - Working memory (в рамках сессии)

2. Long-term Memory (LTM)
   - Vector database storage
   - Persistent user preferences
   - Past experiences (episodic)

3. Episodic Memory
   - Past task completions
   - Success/failure patterns
   - User feedback history

Код: Memory System

from dataclasses import dataclass
from datetime import datetime
from typing import Optional
import json

@dataclass
class Memory:
    """Agent memory system"""
    short_term: list[dict]  # Conversation history
    long_term: dict         # Vector DB reference
    episodic: list[dict]    # Past experiences

class MemoryManager:
    """Manage agent memory across sessions"""

    def __init__(self, vector_db, max_short_term: int = 10):
        self.vector_db = vector_db
        self.max_short_term = max_short_term

    def add_to_short_term(self, memory: Memory, message: dict):
        """Add message to short-term memory"""
        memory.short_term.append(message)
        if len(memory.short_term) > self.max_short_term:
            # Summarize old messages
            summary = self._summarize(memory.short_term[:-self.max_short_term])
            memory.short_term = [summary] + memory.short_term[-self.max_short_term:]

    def store_long_term(self, memory: Memory, text: str, metadata: dict):
        """Store in vector database for retrieval"""
        embedding = self.vector_db.embed(text)
        self.vector_db.insert(embedding, metadata)

    def retrieve_relevant(self, memory: Memory, query: str, k: int = 5):
        """Retrieve relevant memories from long-term storage"""
        embedding = self.vector_db.embed(query)
        return self.vector_db.search(embedding, k)

    def record_episode(self, memory: Memory, task: str, outcome: str, success: bool):
        """Record episodic memory"""
        memory.episodic.append({
            "task": task,
            "outcome": outcome,
            "success": success,
            "timestamp": datetime.now().isoformat()
        })

5. Agent Evaluation (2026 Gap Filler)

Источники

Key Benchmarks 2025-2026

Benchmark Domain What Tests Best Score
WebArena Web browsing Multi-step web tasks ~62% (IBM CUGA)
OSWorld Desktop OS File operations, apps ~72% human
Mind2Web Live websites Real-world web tasks ~23% (GPT-4)
SWE-Bench Coding Bug fixes 71-84%
BrowseComp Web navigation Information retrieval 60%+

Key Metrics

Task Success Metrics:
- Task completion rate
- Step accuracy
- Time to completion
- Cost per task

Quality Metrics:
- Correctness of final answer
- Trajectory efficiency (optimal path?)
- Error rate per step
- Recovery rate from errors

Cost Metrics:
- Token usage
- API calls
- Latency
- $/task

Safety Metrics:
- Policy violations
- Harmful actions
- Data leakage incidents

Код: Agent Evaluator

from dataclasses import dataclass
from typing import Callable
import time

@dataclass
class EvalResult:
    success: bool
    steps_taken: int
    tokens_used: int
    latency_ms: float
    cost_usd: float
    errors: list[str]

class AgentEvaluator:
    """Evaluate agent performance"""

    def __init__(self, cost_per_1k_tokens: float = 0.01):
        self.cost_per_1k_tokens = cost_per_1k_tokens

    def evaluate(
        self,
        agent,
        task: str,
        ground_truth: Callable,
        max_steps: int = 50
    ) -> EvalResult:
        start_time = time.time()
        tokens_used = 0
        errors = []

        # Run agent
        result = agent.run(task, max_iterations=max_steps)

        # Collect metrics
        latency_ms = (time.time() - start_time) * 1000
        tokens_used = getattr(agent, 'total_tokens', 0)
        cost_usd = (tokens_used / 1000) * self.cost_per_1k_tokens

        # Check success
        success = ground_truth(result)

        return EvalResult(
            success=success,
            steps_taken=len(agent.history) if hasattr(agent, 'history') else 0,
            tokens_used=tokens_used,
            latency_ms=latency_ms,
            cost_usd=cost_usd,
            errors=errors
        )

    def benchmark(self, agent, tasks: list[dict]) -> dict:
        """Run benchmark suite"""
        results = []
        for task in tasks:
            result = self.evaluate(agent, task['prompt'], task['validator'])
            results.append(result)

        return {
            "success_rate": sum(r.success for r in results) / len(results),
            "avg_steps": sum(r.steps_taken for r in results) / len(results),
            "avg_latency_ms": sum(r.latency_ms for r in results) / len(results),
            "total_cost_usd": sum(r.cost_usd for r in results)
        }

6. Tool Use Safety (2026 Gap Filler)

Источники

Defence-in-Depth Architecture

Layer 1: Input Validation
├── Schema validation
├── Content moderation
├── Rate limiting
└── Authentication

Layer 2: Deterministic Guardrails
├── Tool allowlists
├── Parameter constraints
├── SQL sanitization
└── Path traversal prevention

Layer 3: Model-Level Security
├── Constitutional AI
├── Fine-tuned rejection
├── Hidden chain-of-thought
└── Structured outputs

Layer 4: Human-in-the-Loop
├── Approval workflows
├── Audit logging
├── Rollback capabilities
└── Escalation paths

Layer 5: LLM-as-Judge
├── Output validation
├── Harm detection
├── Policy compliance
└── Quality scoring

Key Threats (OWASP LLM Top 10)

Threat Description Mitigation
LLM01 Prompt Injection Input/output filtering, separation
LLM02 Insecure Output Output validation, encoding
LLM03 Training Data Poisoning Data provenance, validation
LLM04 Model DoS Rate limiting, resource caps
LLM05 Supply Chain Dependency scanning
LLM06 Sensitive Info Disclosure PII detection, redaction
LLM07 Insecure Plugin Tool allowlists, sandboxing
LLM08 Excessive Agency Permission scoping
LLM09 Overreliance Confidence thresholds
LLM10 Model Theft Access controls, monitoring

Код: Security Layer

from dataclasses import dataclass
from typing import Callable, Any
import re

@dataclass
class ToolCall:
    name: str
    params: dict

class AgentSecurityLayer:
    """Defence-in-depth security for AI agents"""

    def __init__(self, allowed_tools: set[str], sensitive_patterns: list[str]):
        self.allowed_tools = allowed_tools
        self.sensitive_patterns = [re.compile(p) for p in sensitive_patterns]

    def validate_input(self, user_input: str) -> tuple[bool, str]:
        """Layer 1: Input validation"""
        # Check for injection patterns
        for pattern in self.sensitive_patterns:
            if pattern.search(user_input):
                return False, f"Injection pattern detected"

        # Length check
        if len(user_input) > 10000:
            return False, "Input too long"

        return True, user_input

    def validate_tool_call(self, call: ToolCall) -> tuple[bool, str]:
        """Layer 2: Deterministic guardrails"""
        # Tool allowlist
        if call.name not in self.allowed_tools:
            return False, f"Tool '{call.name}' not in allowlist"

        # Parameter validation
        if call.name == "execute_sql":
            # Block dangerous SQL
            dangerous = ["DROP", "DELETE", "TRUNCATE", "ALTER"]
            sql = call.params.get("query", "").upper()
            for d in dangerous:
                if d in sql:
                    return False, f"Blocked dangerous SQL: {d}"

        if call.name == "read_file":
            # Path traversal prevention
            path = call.params.get("path", "")
            if ".." in path or path.startswith("/etc"):
                return False, "Path traversal blocked"

        return True, "OK"

    def validate_output(self, output: str) -> tuple[bool, str]:
        """Layer 5: LLM-as-Judge (simplified)"""
        # Check for PII leakage
        pii_patterns = [
            r'\b\d{16}\b',  # Credit card
            r'\b\d{3}-\d{2}-\d{4}\b',  # SSN
            r'\b[A-Z]{2}\d{9}\b',  # Passport
        ]
        for pattern in pii_patterns:
            if re.search(pattern, output):
                return False, "PII detected in output"

        return True, output

    def with_human_approval(
        self,
        call: ToolCall,
        risk_level: str = "medium"
    ) -> bool:
        """Layer 4: Human-in-the-Loop"""
        HIGH_RISK_TOOLS = {"execute_sql", "send_email", "delete_file"}

        if call.name in HIGH_RISK_TOOLS or risk_level == "high":
            # In production: send to approval queue
            # Return True only after human approval
            return False  # Block until approved

        return True

Observability

# OpenTelemetry for agent tracing
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider

tracer = trace.get_tracer(__name__)

async def traced_agent_run(agent, task: str):
    with tracer.start_as_current_span("agent.run") as span:
        span.set_attribute("task", task)
        span.set_attribute("agent.name", agent.name)

        try:
            result = await agent.run(task)
            span.set_attribute("result.success", True)
            return result
        except Exception as e:
            span.set_attribute("result.success", False)
            span.record_exception(e)
            raise

Видео-ресурсы

  1. LangChain Academy — free courses
  2. DeepLearning.AI — AI Agents courses
  3. Andrew Ng — Agentic workflows