Перейти к содержанию

Воркфлоу AI-агентов

~12 минут чтения

Agent workflow loop, agent stack (7 layers), 3 levels agentic behavior, 9 workflow patterns (ReAct, Plan-Execute, LATS, ReWOO, ToT, Reflection...), multi-agent orchestration (Manager-Worker, Router, Debate, Parallel), RAG vs Agentic RAG, multimodal RAG, tool calling reliability, frameworks (LangGraph, AutoGen, CrewAI), evaluation, production deployment (2025-2026)


Предварительно: LLM-агенты, Function Calling & Tool Use

Зачем это нужно

Agent workflows -- переход от разовых LLM-вызовов к полноценным процессным движкам. По данным Gartner, интерес к multi-agent системам вырос на 1445% с Q1 2024 по Q2 2025. К 2026 году 40% enterprise-приложений используют task-specific агентов (против <5% в 2025). Средний customer support agent снижает время ответа на 60%, а ChatDev-подобные multi-agent системы повышают точность кода на 67%. При этом стоимость одной multi-agent research-сессии составляет \(0.10--\)1.00 -- на порядки дешевле человеческого аналитика.

Ключевые концепции

Agent Workflow -- определение

An agent workflow is not "a chatbot." It's a workflow engine where LLMs decide the next step, use tools safely, and produce auditable outcomes.

2025-2026 Shift: cool demos -> operational software. 40% enterprise applications feature task-specific agents (vs <5% in 2025).

Три уровня Agentic Behavior

Level Type Capability Example
Level 1 AI Workflow Output decisions (generate based on prompts) Q&A, summarization
Level 2 Router Workflow Task decisions (choose tools/paths) Triage, routing
Level 3 Autonomous Agent Process decisions (create new tasks/tools) Self-improving systems

Agent Stack (7 Layers)

graph TD
    L1["Interface Layer<br/>Chat, Email, Voice, Ticketing, Slack/Teams"]
    L2["Orchestration Layer<br/>State Machine / Graph / Workflow Engine"]
    L3["Reasoning + Planning<br/>Task decomposition, routing, role selection"]
    L4["Tools<br/>Function calling, API actions, browser, code execution"]
    L5["Knowledge<br/>RAG (Vector DB + documents), SQL, enterprise search"]
    L6["Memory<br/>Short-term state + Long-term profiles/preferences"]
    L7["Safety + Governance<br/>Policies, permissions, audit logs, redaction, approvals"]
    L8["Observability + Evaluation<br/>Traces, metrics, eval harnesses, regression tests"]

    L1 --> L2 --> L3 --> L4 --> L5 --> L6 --> L7 --> L8

    style L1 fill:#e8eaf6,stroke:#3f51b5
    style L2 fill:#e8f5e9,stroke:#4caf50
    style L3 fill:#fff3e0,stroke:#ef6c00
    style L4 fill:#f3e5f5,stroke:#9c27b0
    style L5 fill:#e8eaf6,stroke:#3f51b5
    style L6 fill:#e8f5e9,stroke:#4caf50
    style L7 fill:#fce4ec,stroke:#c62828
    style L8 fill:#fff3e0,stroke:#ef6c00

Agent Workflow Loop (7 Steps)

Goal -> Router -> Plan -> Retrieve -> Act (tools) -> Verify -> Output -> Log -> Human Escalation (if needed)
  1. Interprets goal -- understand user intent
  2. Plans -- break work into steps
  3. Uses tools -- APIs, databases, SaaS actions
  4. Retrieves context -- RAG, search, docs
  5. Writes outputs -- structured + natural language
  6. Verifies -- self-checks, tests, policy checks
  7. Escalates -- to human when confidence/risk thresholds fail

9 Agentic Workflow Patterns

1. ReAct (Reasoning + Acting)

Thought: I need to find X
Action: Search["X"]
Observation: Result
Thought: Now I know Y
Action: ...
Best for Watch out
Fast-moving tasks: triage, routing, support macros Can loop endlessly without stop conditions

2. Plan-and-Execute

1. Planner: Break goal into steps
2. Executor: Run each step sequentially
3. Verifier: Check outputs
Best for Watch out
Report generation, research summaries, data enrichment Plans can be rigid when conditions change

3. Planner-Critic-Executor

Planner -> Critic -> Executor -> Output
             ^
        Quality check before execution
Best for Watch out
Contract drafting, financial reporting Increased latency for critic step

4. Reflection Loop

Generate -> Critique -> Refine -> Critique -> ... -> Final Output
Best for Watch out
Writing, summarization, design recommendations Extra reflection adds cost and latency

5. Tree of Thoughts

                +-- Branch A --+
                |              |
Root -> Thought +-- Branch B --+-- Converge -> Best Answer
                |              |
                +-- Branch C --+
Best for Watch out
Creative/logical problem solving Branching multiplies costs quickly

MCTS-style search: select promising path -> expand with tool calls -> simulate outcomes -> backpropagate rewards.

Best for Watch out
Scenarios with real-time tool feedback Success depends on strong scoring signals

7. ReWOO (Reasoning Without Observation)

Externalizes reasoning by explicitly referencing tools and data in plan before execution.

Best for Watch out
Cases requiring explicit tool/data planning More setup effort

8. Router-Specialist Multi-Agent

                +-- Finance Agent
                |
Input -> Router +-- IT Agent
                |
                +-- HR Agent
Best for Watch out
Single entry point routing to domain experts Incorrect routing causes cascading errors

9. Debate/Consensus Multi-Agent

Agent A proposes -> Agent B critiques -> Judge decides
Best for Watch out
High-stakes: legal review, risk scoring More time and compute

Multi-Agent Orchestration Patterns

Pattern A: Manager-Worker (Most Common)

graph TD
    M["Manager<br/>Interprets goal, decomposes, assigns"]
    W1["Worker Agent 1"]
    W2["Worker Agent 2"]
    W3["Worker Agent 3"]
    V["Verifier<br/>Checks outputs, policy, formatting"]

    M --> W1 & W2 & W3
    W1 & W2 & W3 --> V

    style M fill:#e8eaf6,stroke:#3f51b5
    style W1 fill:#e8f5e9,stroke:#4caf50
    style W2 fill:#e8f5e9,stroke:#4caf50
    style W3 fill:#e8f5e9,stroke:#4caf50
    style V fill:#fff3e0,stroke:#ef6c00

Best for: Enterprise operations, analytics, customer support escalation.

Pattern B: Router + Specialists (Fast + Scalable)

Router classifies intent, selects one specialist. Avoids "committee chat" overhead.

Best for: High-volume tasks (triage, categorization, FAQ with actions).

Pattern C: Debate + Judge (Use Sparingly)

Two agents propose solutions; judge selects.

Best for: Legal review drafts, risk scoring, strategy docs.

Pattern D: Parallel Research + Synthesis (High Leverage)

graph TD
    C["Coordinator"]
    G1["Gather Source 1"]
    G2["Gather Source 2"]
    G3["Gather Source 3"]
    S["Synthesizer<br/>Combines with citations"]

    C --> G1 & G2 & G3
    G1 & G2 & G3 --> S

    style C fill:#e8eaf6,stroke:#3f51b5
    style G1 fill:#e8f5e9,stroke:#4caf50
    style G2 fill:#e8f5e9,stroke:#4caf50
    style G3 fill:#e8f5e9,stroke:#4caf50
    style S fill:#fff3e0,stroke:#ef6c00

Best for: Market research, competitive intel, policy updates.

Production insight: Multi-agent systems fail less when you treat them like microservices: contracts, schemas, timeouts, and ownership.


RAG vs Agent Workflows vs Agentic RAG

Approach Best For Strengths Failure Mode Fix
Classic RAG Q&A, policy lookup Fast, cheap, explainable Hallucinated synthesis Better chunking, hybrid search
Single-Agent Ticket triage, CRM updates Simple orchestration Tool misuse, looping Guardrails, timeouts
Multi-Agent Procurement, incident response Specialization + parallel Coordination overhead Shared state, role contracts
Agentic RAG Enterprise workflows Higher task completion Retrieval drift + action risk Retrieval constraints + approvals

2026 trend: "Agentic RAG" -- default enterprise pattern: retrieval grounds decisions, tools execute changes, policy gates manage risk.

Agentic RAG Architecture

Query -> Router -> Planner -> Retriever (RAG) -> Executor (Tools) -> Verifier -> Output
                                                        |
                                              Policy Gates & Approvals

Multimodal RAG

Three Approaches

Approach Description Pros Cons
Text-translation Convert images to captions, audio to transcripts Easy integration Information bottleneck
Text retrieval + multimodal generation Retrieve via text, generate with original media Better expressiveness Retrieval still text-dependent
Multimodal retrieval Cross-modal embeddings in shared vector space Maximum grounding Computationally expensive

Pipeline

Step 1: Multimodal Knowledge Preparation
  Images -> Vision encoder -> Embeddings
  Audio -> Audio encoder -> Embeddings
  Text -> Text encoder -> Embeddings
  All stored in Vector DB

Step 2: Processing and Retrieval
  Query -> Encode (any modality) -> Vector search -> Top-K multimodal chunks

Step 3: Multimodal Context Building
  Early fusion: Convert all to text
  Late fusion: Keep modalities separate
  Multimodal LLM generates response

Use Cases

Use Case Benefit
Medical imaging Retrieve similar cases with diagnosis
Product search Search by image, get product details
Document analysis Understand charts, tables, diagrams
Video QA Answer questions about video content

Tool Calling Reliability

What "Good" Looks Like

  1. Tools accept strict schemas (JSON Schema / typed models)
  2. Agents produce structured outputs for every action
  3. Every tool call is logged with inputs/outputs
  4. High-risk tools require human approval (HITL)
  5. Tools run in least-privilege mode (scoped tokens)

Tool Contract Example

from pydantic import BaseModel, Field
from typing import Literal, Optional

class CreateJiraTicket(BaseModel):
    project_key: str = Field(..., description="Jira project key")
    summary: str = Field(..., max_length=120)
    description: str
    severity: Literal["low", "medium", "high", "critical"]
    requester_email: str
    approval_required: bool = True
    related_asset_id: Optional[str] = None

def create_jira_ticket(payload: CreateJiraTicket) -> dict:
    # 1) Policy check (PII redaction, allowed project)
    # 2) Call Jira API
    # 3) Return structured receipt
    return {"ticket_id": "ITOPS-1842", "status": "created"}

Guardrails

Guardrail Purpose
Allowlists Limit tools, destinations, domains
Step budgets Max turns, max tool calls
Deterministic formatting Schemas + validators
Sandboxing Code execution, browser actions
Confirmation prompts Destructive actions (delete, refund, terminate)

Production Workflow (9 Steps)

Step Action
1 Define the job -- one sentence goal + success criteria
2 Map the workflow -- states, transitions, failure paths, escalation
3 Choose tools -- APIs first; browser automation last
4 Add knowledge -- RAG with citations + freshness rules
5 Design memory -- store only what needed; set retention/redaction
6 Add policy gates -- permissions, approvals, audit logging, PII
7 Implement evals -- offline test set + adversarial cases
8 Ship with observability -- traces, tool metrics, cost, latency
9 Iterate -- tighten prompts, schemas, retrieval based on data

Practical Examples

IT Incident Triage:

Ingest alert -> Classify -> Retrieve runbook (RAG) -> Propose actions -> Execute safe actions -> Open ticket -> Notify on-call
HITL gate: any action affecting production traffic.

Finance Ops (Invoice Exceptions):

Read invoice -> Validate vendor + PO -> Check anomalies -> Request missing info -> Update ERP -> Audit trail

Sales Ops (Account Research):

Pull CRM context -> Research company news -> Draft personalized email -> Suggest next action -> Log activity
Keep "send email" behind explicit approval.


Best Practices

Human-in-the-Loop

Practice Implementation
Approvals Required for irreversible actions
Diffs Show what will change, not just explanations
Citations "Why this action" + sources + tool summary
Escalation One-click "escalate to human" at every stage

Governance

Requirement Implementation
Least privilege Tokens per agent + per tool
Central policy engine Who can do what, where, when
Audit logs Prompts, retrieval sources, tool I/O
PII handling Detection + redaction before storage
Data residency Region pinning, private deployments

Cost Optimization

Strategy Impact
Smaller models Routing, extraction, classification
Caching Retrieval results with TTLs
Tool budgets Limit calls, early stopping
Structured extraction Over long-form generation
Track cost per resolved workflow Not tokens

Anti-Patterns to Avoid

  1. One Giant Prompt -- hides errors, hard to debug. Design modular patterns.
  2. No Evals, Only Vibes -- intuition doesn't scale. Build dashboards early.
  3. Tool Chaos -- untracked tools, unpinned model versions. Version everything.

Framework Comparison (2026)

Framework Type Best For Key Features
LangGraph Graph-based Stateful workflows, complex branching Explicit multi-agent coordination, stateful, cycles
AutoGen Conversational Research, coding copilots HITL, flexible agents, conversation management
CrewAI Production teams Business applications Role-based agents, clean architecture
LangChain Ecosystem Maximum flexibility Massive component library, extensive tooling
OpenAI Swarm Lightweight Prototyping Routine-based, minimal overhead

Selection Guide

Need Recommended
Complex state management LangGraph
Research/coding copilots AutoGen
Business workflows CrewAI
Maximum flexibility LangChain
Rapid prototyping OpenAI Swarm
Hard orchestration Temporal, Step Functions

Evaluation and Monitoring

Metrics (Minimum Set)

Metric Description
Task success rate Did it complete correctly?
Tool success rate API errors, invalid schemas, retries
Escalation rate How often humans intervene
Time-to-resolution Latency end-to-end
Cost per task Model + tools + human time
Grounding quality Citation accuracy, retrieval hit rate
Safety metrics Policy violations, blocked actions

Observability Essentials

  1. Traces -- every step, tool call, retrieved doc ID
  2. Replay -- reproduce incidents with same state
  3. Regression tests -- weekly against fixed suite
  4. Canaries -- roll out new prompts/models to 1-5% first

Для интервью

Q: "Что такое agent workflow?"

Structured loop: LLM interprets goal -> plans (task decomposition) -> uses tools (API, DB) -> retrieves context (RAG) -> writes outputs -> verifies (self-checks + policy) -> escalates to human if confidence low. Agent Stack: 7 layers (Interface, Orchestration, Reasoning, Tools, Knowledge, Memory, Safety+Governance, Observability). 2026: 40% enterprise apps use agents.

Q: "Какие есть паттерны agent workflows?"

9 основных: (1) ReAct -- reasoning + action в малых шагах. (2) Plan-and-Execute -- strategic planning отдельно от execution. (3) Planner-Critic-Executor -- review перед execution. (4) Reflection Loop -- self-critique + refine. (5) Tree of Thoughts -- graph-based, multiple reasoning branches. (6) LATS -- MCTS-style search с tool feedback. (7) ReWOO -- explicit tool/data planning перед execution. (8) Router-Specialist -- route to domain experts. (9) Debate/Consensus -- multiple agents propose, judge decides.

Q: "Multi-agent orchestration patterns?"

(1) Manager-Worker (most common): manager decomposes, workers execute, verifier checks. (2) Router + Specialists: fast, avoids committee overhead. (3) Debate + Judge: high-stakes, use sparingly. (4) Parallel Research + Synthesis: multiple agents gather sources, one synthesizes with citations. Treat like microservices: contracts, schemas, timeouts, ownership.

Q: "RAG vs Agentic RAG?"

Classic RAG: Q&A over docs, fast, cheap. Agent Workflows: actions + decisions + tool use. Agentic RAG: retrieval grounds decisions + tools execute changes + policy gates manage risk. Default enterprise pattern in 2026.

Q: "Multimodal RAG?"

3 approaches: (1) Text-translation: convert images/audio to text. (2) Text retrieval + multimodal generation. (3) Full multimodal retrieval: cross-modal embeddings in shared vector space. Pipeline: prepare embeddings all modalities -> vector search -> multimodal LLM generates response.

Ключевые числа

Факт Значение
Enterprise agent adoption 2026 40% of applications
Multi-agent inquiry growth 1,445% (Q1 2024 -> Q2 2025)
Enterprise AI workloads 2026 80%+
Cross-validation accuracy improvement 40%
ChatDev code accuracy improvement 67%
Customer support response time reduction 60%
Task success rate target >90%
Escalation rate target <10%
Tool success rate target >95%
Cost per simple classification \(0.001-\)0.01
Cost per RAG + tool calling \(0.01-\)0.10
Cost per multi-agent research \(0.10-\)1.00
Retrieval latency target <100ms
Re-ranking latency target <50ms
Generation latency 500-2000ms

Заблуждение: Multi-agent всегда лучше single-agent

По данным benchmarks, single-agent с правильным набором tools решает 70-80% enterprise задач. Multi-agent добавляет 1.5-2.0x token overhead (AutoGen) и coordination latency. Используйте multi-agent только когда задача требует параллельной обработки или разных domain expertise -- остальное решается Router + tools.

Заблуждение: ReAct -- универсальный паттерн для всех агентов

ReAct отлично работает для fast-moving тасков (triage, routing), но на задачах требующих strategic planning (report generation, data enrichment) Plan-and-Execute паттерн показывает на 25-40% более высокий task success rate. ReAct без явных stop conditions может зацикливаться -- всегда устанавливайте step budget (обычно 5-15 шагов).

Заблуждение: Agentic RAG = обычный RAG + tools

Agentic RAG принципиально отличается: agent решает когда и что retrievать, может делать multi-hop retrieval и верифицировать retrieved context. Обычный RAG имеет retrieval drift при сложных запросах (accuracy падает на 30-40% на multi-hop questions). Agentic RAG добавляет policy gates и approval steps, что критично для enterprise -- без этого agent может выполнить необратимые действия на основе галлюцинированного контекста.


Вопросы для собеседования

Q: Как выбрать между 9 паттернами agent workflows для конкретной задачи?

❌ Red flag: "Используем ReAct для всего, это самый популярный паттерн."

✅ Strong answer: "Выбор зависит от характера задачи. Для fast-moving тасков (triage, routing, support macros) -- ReAct, потому что reasoning и action чередуются в малых шагах. Для задач требующих strategic planning (report generation, data enrichment) -- Plan-and-Execute, где planner строит план, executor последовательно выполняет. Для high-stakes (legal review, risk scoring) -- Debate/Consensus, где несколько агентов предлагают решения, judge выбирает. Tree of Thoughts -- для задач с ветвлением (5-20x дороже, но +30-150% на creative/logical solving). В production обычно комбинируют: Router на входе + специализированный паттерн по domain."

Q: Чем Agentic RAG отличается от обычного RAG и когда его использовать?

❌ Red flag: "Agentic RAG -- это просто RAG с добавленными tools."

✅ Strong answer: "Classic RAG: query -> retrieve -> generate. Быстро, дёшево, но hallucinated synthesis на сложных запросах. Agentic RAG: agent решает когда retrievать, делает multi-hop retrieval, верифицирует контекст, и может выполнять actions (API calls, DB updates). Ключевое отличие -- policy gates: agent перед необратимым действием проходит approval step. Enterprise default в 2026. Failure mode: retrieval drift + action risk -- fix через retrieval constraints (max 3 hops) + human approval на destructive actions. Cost: $0.01-0.10 за RAG + tool calling vs $0.001 за simple classification."

Q: Как проектировать multi-agent систему для production?

❌ Red flag: "Просто создаём несколько агентов и даём им задачи."

✅ Strong answer: "Treat multi-agent systems like microservices: contracts, schemas, timeouts, ownership. Четыре паттерна: Manager-Worker (most common) -- manager decompose задачу, workers execute, verifier проверяет. Router + Specialists -- для high-volume (triage, FAQ). Debate + Judge -- use sparingly, high-stakes only. Parallel Research + Synthesis -- high leverage для research tasks. Практически: limit crew size 2-5 agents (CrewAI benchmark), каждый agent с explicit role и tool set, shared state через checkpointer (LangGraph), step budget per agent (5-15), escalation rate target <10%, task success rate target >90%."

Q: Какие метрики обязательны для production agent workflow?

❌ Red flag: "Отслеживаем только latency и количество ошибок."

✅ Strong answer: "Minimum set: task success rate (target >90%), tool success rate (>95%), escalation rate (<10%), time-to-resolution, cost per task (model + tools + human time), grounding quality (citation accuracy, retrieval hit rate), safety metrics (policy violations, blocked actions). Отдельно: traces каждого шага и tool call, replay capability для reproduce incidents, regression tests (weekly against fixed suite), canary rollouts (1-5% traffic для новых prompts/models). Ключевая метрика -- cost per resolved workflow, не cost per token."


Источники

  1. AiMatch Pro -- "AI Agent Workflows in 2025: The 2026 Playbook"
  2. Vellum -- "The 2026 Guide to AI Agent Workflows"
  3. Beam AI -- "The 9 Best Agentic Workflow Patterns in 2026"
  4. Collabnix -- "Multi-Agent and Multi-LLM Architecture: Complete Guide for 2025"
  5. IBM -- "What is Multimodal RAG?"
  6. arXiv:2504.08748 -- "A Survey of Multimodal Retrieval-Augmented Generation"
  7. arXiv:2510.09244 -- "Fundamentals of Building Autonomous LLM Agents"
  8. Meta AI -- "Retrieval-Augmented Multimodal Language Modeling"
  9. OpenAI/Anthropic -- Tool use documentation
  10. Gartner -- Multi-agent system inquiry growth statistics