Воркфлоу AI-агентов¶

~12 минут чтения

Agent workflow loop, agent stack (7 layers), 3 levels agentic behavior, 9 workflow patterns (ReAct, Plan-Execute, LATS, ReWOO, ToT, Reflection...), multi-agent orchestration (Manager-Worker, Router, Debate, Parallel), RAG vs Agentic RAG, multimodal RAG, tool calling reliability, frameworks (LangGraph, AutoGen, CrewAI), evaluation, production deployment (2025-2026)

Предварительно: LLM-агенты, Function Calling & Tool Use

Зачем это нужно¶

Agent workflows -- переход от разовых LLM-вызовов к полноценным процессным движкам. По данным Gartner, интерес к multi-agent системам вырос на 1445% с Q1 2024 по Q2 2025. К 2026 году 40% enterprise-приложений используют task-specific агентов (против <5% в 2025). Средний customer support agent снижает время ответа на 60%, а ChatDev-подобные multi-agent системы повышают точность кода на 67%. При этом стоимость одной multi-agent research-сессии составляет $0.10--$1.00 -- на порядки дешевле человеческого аналитика.

Ключевые концепции¶

Agent Workflow -- определение¶

An agent workflow is not "a chatbot." It's a workflow engine where LLMs decide the next step, use tools safely, and produce auditable outcomes.

2025-2026 Shift: cool demos -> operational software. 40% enterprise applications feature task-specific agents (vs <5% in 2025).

Три уровня Agentic Behavior¶

Level	Type	Capability	Example
Level 1	AI Workflow	Output decisions (generate based on prompts)	Q&A, summarization
Level 2	Router Workflow	Task decisions (choose tools/paths)	Triage, routing
Level 3	Autonomous Agent	Process decisions (create new tasks/tools)	Self-improving systems

Agent Stack (7 Layers)¶

graph TD
    L1["Interface Layer<br/>Chat, Email, Voice, Ticketing, Slack/Teams"]
    L2["Orchestration Layer<br/>State Machine / Graph / Workflow Engine"]
    L3["Reasoning + Planning<br/>Task decomposition, routing, role selection"]
    L4["Tools<br/>Function calling, API actions, browser, code execution"]
    L5["Knowledge<br/>RAG (Vector DB + documents), SQL, enterprise search"]
    L6["Memory<br/>Short-term state + Long-term profiles/preferences"]
    L7["Safety + Governance<br/>Policies, permissions, audit logs, redaction, approvals"]
    L8["Observability + Evaluation<br/>Traces, metrics, eval harnesses, regression tests"]

    L1 --> L2 --> L3 --> L4 --> L5 --> L6 --> L7 --> L8

    style L1 fill:#e8eaf6,stroke:#3f51b5
    style L2 fill:#e8f5e9,stroke:#4caf50
    style L3 fill:#fff3e0,stroke:#ef6c00
    style L4 fill:#f3e5f5,stroke:#9c27b0
    style L5 fill:#e8eaf6,stroke:#3f51b5
    style L6 fill:#e8f5e9,stroke:#4caf50
    style L7 fill:#fce4ec,stroke:#c62828
    style L8 fill:#fff3e0,stroke:#ef6c00

Agent Workflow Loop (7 Steps)¶

Goal -> Router -> Plan -> Retrieve -> Act (tools) -> Verify -> Output -> Log -> Human Escalation (if needed)

Interprets goal -- understand user intent
Plans -- break work into steps
Uses tools -- APIs, databases, SaaS actions
Retrieves context -- RAG, search, docs
Writes outputs -- structured + natural language
Verifies -- self-checks, tests, policy checks
Escalates -- to human when confidence/risk thresholds fail

9 Agentic Workflow Patterns¶

1. ReAct (Reasoning + Acting)¶

Thought: I need to find X
Action: Search["X"]
Observation: Result
Thought: Now I know Y
Action: ...

Best for	Watch out
Fast-moving tasks: triage, routing, support macros	Can loop endlessly without stop conditions

2. Plan-and-Execute¶

1. Planner: Break goal into steps
2. Executor: Run each step sequentially
3. Verifier: Check outputs

Best for	Watch out
Report generation, research summaries, data enrichment	Plans can be rigid when conditions change

3. Planner-Critic-Executor¶

Planner -> Critic -> Executor -> Output
             ^
        Quality check before execution

Best for	Watch out
Contract drafting, financial reporting	Increased latency for critic step

4. Reflection Loop¶

Generate -> Critique -> Refine -> Critique -> ... -> Final Output

Best for	Watch out
Writing, summarization, design recommendations	Extra reflection adds cost and latency

5. Tree of Thoughts¶

                +-- Branch A --+
                |              |
Root -> Thought +-- Branch B --+-- Converge -> Best Answer
                |              |
                +-- Branch C --+

Best for	Watch out
Creative/logical problem solving	Branching multiplies costs quickly

6. LATS (Language Agent Tree Search)¶

MCTS-style search: select promising path -> expand with tool calls -> simulate outcomes -> backpropagate rewards.

Best for	Watch out
Scenarios with real-time tool feedback	Success depends on strong scoring signals

7. ReWOO (Reasoning Without Observation)¶

Externalizes reasoning by explicitly referencing tools and data in plan before execution.

Best for	Watch out
Cases requiring explicit tool/data planning	More setup effort

8. Router-Specialist Multi-Agent¶

                +-- Finance Agent
                |
Input -> Router +-- IT Agent
                |
                +-- HR Agent

Best for	Watch out
Single entry point routing to domain experts	Incorrect routing causes cascading errors

9. Debate/Consensus Multi-Agent¶

Agent A proposes -> Agent B critiques -> Judge decides

Best for	Watch out
High-stakes: legal review, risk scoring	More time and compute

Multi-Agent Orchestration Patterns¶

Pattern A: Manager-Worker (Most Common)¶

graph TD
    M["Manager<br/>Interprets goal, decomposes, assigns"]
    W1["Worker Agent 1"]
    W2["Worker Agent 2"]
    W3["Worker Agent 3"]
    V["Verifier<br/>Checks outputs, policy, formatting"]

    M --> W1 & W2 & W3
    W1 & W2 & W3 --> V

    style M fill:#e8eaf6,stroke:#3f51b5
    style W1 fill:#e8f5e9,stroke:#4caf50
    style W2 fill:#e8f5e9,stroke:#4caf50
    style W3 fill:#e8f5e9,stroke:#4caf50
    style V fill:#fff3e0,stroke:#ef6c00

Best for: Enterprise operations, analytics, customer support escalation.

Pattern B: Router + Specialists (Fast + Scalable)¶

Router classifies intent, selects one specialist. Avoids "committee chat" overhead.

Best for: High-volume tasks (triage, categorization, FAQ with actions).

Pattern C: Debate + Judge (Use Sparingly)¶

Two agents propose solutions; judge selects.

Best for: Legal review drafts, risk scoring, strategy docs.

Pattern D: Parallel Research + Synthesis (High Leverage)¶

graph TD
    C["Coordinator"]
    G1["Gather Source 1"]
    G2["Gather Source 2"]
    G3["Gather Source 3"]
    S["Synthesizer<br/>Combines with citations"]

    C --> G1 & G2 & G3
    G1 & G2 & G3 --> S

    style C fill:#e8eaf6,stroke:#3f51b5
    style G1 fill:#e8f5e9,stroke:#4caf50
    style G2 fill:#e8f5e9,stroke:#4caf50
    style G3 fill:#e8f5e9,stroke:#4caf50
    style S fill:#fff3e0,stroke:#ef6c00

Best for: Market research, competitive intel, policy updates.

Production insight: Multi-agent systems fail less when you treat them like microservices: contracts, schemas, timeouts, and ownership.

RAG vs Agent Workflows vs Agentic RAG¶

Approach	Best For	Strengths	Failure Mode	Fix
Classic RAG	Q&A, policy lookup	Fast, cheap, explainable	Hallucinated synthesis	Better chunking, hybrid search
Single-Agent	Ticket triage, CRM updates	Simple orchestration	Tool misuse, looping	Guardrails, timeouts
Multi-Agent	Procurement, incident response	Specialization + parallel	Coordination overhead	Shared state, role contracts
Agentic RAG	Enterprise workflows	Higher task completion	Retrieval drift + action risk	Retrieval constraints + approvals

2026 trend: "Agentic RAG" -- default enterprise pattern: retrieval grounds decisions, tools execute changes, policy gates manage risk.

Agentic RAG Architecture¶

Query -> Router -> Planner -> Retriever (RAG) -> Executor (Tools) -> Verifier -> Output
                                                        |
                                              Policy Gates & Approvals

Multimodal RAG¶

Three Approaches¶

Approach	Description	Pros	Cons
Text-translation	Convert images to captions, audio to transcripts	Easy integration	Information bottleneck
Text retrieval + multimodal generation	Retrieve via text, generate with original media	Better expressiveness	Retrieval still text-dependent
Multimodal retrieval	Cross-modal embeddings in shared vector space	Maximum grounding	Computationally expensive

Pipeline¶

Step 1: Multimodal Knowledge Preparation
  Images -> Vision encoder -> Embeddings
  Audio -> Audio encoder -> Embeddings
  Text -> Text encoder -> Embeddings
  All stored in Vector DB

Step 2: Processing and Retrieval
  Query -> Encode (any modality) -> Vector search -> Top-K multimodal chunks

Step 3: Multimodal Context Building
  Early fusion: Convert all to text
  Late fusion: Keep modalities separate
  Multimodal LLM generates response

Use Cases¶

Use Case	Benefit
Medical imaging	Retrieve similar cases with diagnosis
Product search	Search by image, get product details
Document analysis	Understand charts, tables, diagrams
Video QA	Answer questions about video content

Tool Calling Reliability¶

What "Good" Looks Like¶

Tools accept strict schemas (JSON Schema / typed models)
Agents produce structured outputs for every action
Every tool call is logged with inputs/outputs
High-risk tools require human approval (HITL)
Tools run in least-privilege mode (scoped tokens)

Tool Contract Example¶

from pydantic import BaseModel, Field
from typing import Literal, Optional

class CreateJiraTicket(BaseModel):
    project_key: str = Field(..., description="Jira project key")
    summary: str = Field(..., max_length=120)
    description: str
    severity: Literal["low", "medium", "high", "critical"]
    requester_email: str
    approval_required: bool = True
    related_asset_id: Optional[str] = None

def create_jira_ticket(payload: CreateJiraTicket) -> dict:
    # 1) Policy check (PII redaction, allowed project)
    # 2) Call Jira API
    # 3) Return structured receipt
    return {"ticket_id": "ITOPS-1842", "status": "created"}

Guardrails¶

Guardrail	Purpose
Allowlists	Limit tools, destinations, domains
Step budgets	Max turns, max tool calls
Deterministic formatting	Schemas + validators
Sandboxing	Code execution, browser actions
Confirmation prompts	Destructive actions (delete, refund, terminate)

Production Workflow (9 Steps)¶

Step	Action
1	Define the job -- one sentence goal + success criteria
2	Map the workflow -- states, transitions, failure paths, escalation
3	Choose tools -- APIs first; browser automation last
4	Add knowledge -- RAG with citations + freshness rules
5	Design memory -- store only what needed; set retention/redaction
6	Add policy gates -- permissions, approvals, audit logging, PII
7	Implement evals -- offline test set + adversarial cases
8	Ship with observability -- traces, tool metrics, cost, latency
9	Iterate -- tighten prompts, schemas, retrieval based on data

Practical Examples¶

IT Incident Triage:

Ingest alert -> Classify -> Retrieve runbook (RAG) -> Propose actions -> Execute safe actions -> Open ticket -> Notify on-call

HITL gate: any action affecting production traffic.

Finance Ops (Invoice Exceptions):

Read invoice -> Validate vendor + PO -> Check anomalies -> Request missing info -> Update ERP -> Audit trail

Sales Ops (Account Research):

Pull CRM context -> Research company news -> Draft personalized email -> Suggest next action -> Log activity

Keep "send email" behind explicit approval.

Best Practices¶

Human-in-the-Loop¶

Practice	Implementation
Approvals	Required for irreversible actions
Diffs	Show what will change, not just explanations
Citations	"Why this action" + sources + tool summary
Escalation	One-click "escalate to human" at every stage

Governance¶

Requirement	Implementation
Least privilege	Tokens per agent + per tool
Central policy engine	Who can do what, where, when
Audit logs	Prompts, retrieval sources, tool I/O
PII handling	Detection + redaction before storage
Data residency	Region pinning, private deployments

Cost Optimization¶

Strategy	Impact
Smaller models	Routing, extraction, classification
Caching	Retrieval results with TTLs
Tool budgets	Limit calls, early stopping
Structured extraction	Over long-form generation
Track cost per resolved workflow	Not tokens

Anti-Patterns to Avoid¶

One Giant Prompt -- hides errors, hard to debug. Design modular patterns.
No Evals, Only Vibes -- intuition doesn't scale. Build dashboards early.
Tool Chaos -- untracked tools, unpinned model versions. Version everything.

Framework Comparison (2026)¶

Framework	Type	Best For	Key Features
LangGraph	Graph-based	Stateful workflows, complex branching	Explicit multi-agent coordination, stateful, cycles
AutoGen	Conversational	Research, coding copilots	HITL, flexible agents, conversation management
CrewAI	Production teams	Business applications	Role-based agents, clean architecture
LangChain	Ecosystem	Maximum flexibility	Massive component library, extensive tooling
OpenAI Swarm	Lightweight	Prototyping	Routine-based, minimal overhead

Selection Guide¶

Need	Recommended
Complex state management	LangGraph
Research/coding copilots	AutoGen
Business workflows	CrewAI
Maximum flexibility	LangChain
Rapid prototyping	OpenAI Swarm
Hard orchestration	Temporal, Step Functions

Evaluation and Monitoring¶

Metrics (Minimum Set)¶

Metric	Description
Task success rate	Did it complete correctly?
Tool success rate	API errors, invalid schemas, retries
Escalation rate	How often humans intervene
Time-to-resolution	Latency end-to-end
Cost per task	Model + tools + human time
Grounding quality	Citation accuracy, retrieval hit rate
Safety metrics	Policy violations, blocked actions

Observability Essentials¶

Traces -- every step, tool call, retrieved doc ID
Replay -- reproduce incidents with same state
Regression tests -- weekly against fixed suite
Canaries -- roll out new prompts/models to 1-5% first

Для интервью¶

Q: "Что такое agent workflow?"¶

Structured loop: LLM interprets goal -> plans (task decomposition) -> uses tools (API, DB) -> retrieves context (RAG) -> writes outputs -> verifies (self-checks + policy) -> escalates to human if confidence low. Agent Stack: 7 layers (Interface, Orchestration, Reasoning, Tools, Knowledge, Memory, Safety+Governance, Observability). 2026: 40% enterprise apps use agents.

Q: "Какие есть паттерны agent workflows?"¶

9 основных: (1) ReAct -- reasoning + action в малых шагах. (2) Plan-and-Execute -- strategic planning отдельно от execution. (3) Planner-Critic-Executor -- review перед execution. (4) Reflection Loop -- self-critique + refine. (5) Tree of Thoughts -- graph-based, multiple reasoning branches. (6) LATS -- MCTS-style search с tool feedback. (7) ReWOO -- explicit tool/data planning перед execution. (8) Router-Specialist -- route to domain experts. (9) Debate/Consensus -- multiple agents propose, judge decides.

Q: "Multi-agent orchestration patterns?"¶

(1) Manager-Worker (most common): manager decomposes, workers execute, verifier checks. (2) Router + Specialists: fast, avoids committee overhead. (3) Debate + Judge: high-stakes, use sparingly. (4) Parallel Research + Synthesis: multiple agents gather sources, one synthesizes with citations. Treat like microservices: contracts, schemas, timeouts, ownership.

Q: "RAG vs Agentic RAG?"¶

Classic RAG: Q&A over docs, fast, cheap. Agent Workflows: actions + decisions + tool use. Agentic RAG: retrieval grounds decisions + tools execute changes + policy gates manage risk. Default enterprise pattern in 2026.

Q: "Multimodal RAG?"¶

3 approaches: (1) Text-translation: convert images/audio to text. (2) Text retrieval + multimodal generation. (3) Full multimodal retrieval: cross-modal embeddings in shared vector space. Pipeline: prepare embeddings all modalities -> vector search -> multimodal LLM generates response.

Ключевые числа¶

Факт	Значение
Enterprise agent adoption 2026	40% of applications
Multi-agent inquiry growth	1,445% (Q1 2024 -> Q2 2025)
Enterprise AI workloads 2026	80%+
Cross-validation accuracy improvement	40%
ChatDev code accuracy improvement	67%
Customer support response time reduction	60%
Task success rate target	>90%
Escalation rate target	<10%
Tool success rate target	>95%
Cost per simple classification	$0.001-$0.01
Cost per RAG + tool calling	$0.01-$0.10
Cost per multi-agent research	$0.10-$1.00
Retrieval latency target	<100ms
Re-ranking latency target	<50ms
Generation latency	500-2000ms

Заблуждение: Multi-agent всегда лучше single-agent

По данным benchmarks, single-agent с правильным набором tools решает 70-80% enterprise задач. Multi-agent добавляет 1.5-2.0x token overhead (AutoGen) и coordination latency. Используйте multi-agent только когда задача требует параллельной обработки или разных domain expertise -- остальное решается Router + tools.

Заблуждение: ReAct -- универсальный паттерн для всех агентов

ReAct отлично работает для fast-moving тасков (triage, routing), но на задачах требующих strategic planning (report generation, data enrichment) Plan-and-Execute паттерн показывает на 25-40% более высокий task success rate. ReAct без явных stop conditions может зацикливаться -- всегда устанавливайте step budget (обычно 5-15 шагов).

Заблуждение: Agentic RAG = обычный RAG + tools

Agentic RAG принципиально отличается: agent решает когда и что retrievать, может делать multi-hop retrieval и верифицировать retrieved context. Обычный RAG имеет retrieval drift при сложных запросах (accuracy падает на 30-40% на multi-hop questions). Agentic RAG добавляет policy gates и approval steps, что критично для enterprise -- без этого agent может выполнить необратимые действия на основе галлюцинированного контекста.

Вопросы для собеседования¶

Q: Как выбрать между 9 паттернами agent workflows для конкретной задачи?

Red flag: "Используем ReAct для всего, это самый популярный паттерн."

Strong answer: "Выбор зависит от характера задачи. Для fast-moving тасков (triage, routing, support macros) -- ReAct, потому что reasoning и action чередуются в малых шагах. Для задач требующих strategic planning (report generation, data enrichment) -- Plan-and-Execute, где planner строит план, executor последовательно выполняет. Для high-stakes (legal review, risk scoring) -- Debate/Consensus, где несколько агентов предлагают решения, judge выбирает. Tree of Thoughts -- для задач с ветвлением (5-20x дороже, но +30-150% на creative/logical solving). В production обычно комбинируют: Router на входе + специализированный паттерн по domain."

Q: Чем Agentic RAG отличается от обычного RAG и когда его использовать?

Red flag: "Agentic RAG -- это просто RAG с добавленными tools."

Strong answer: "Classic RAG: query -> retrieve -> generate. Быстро, дёшево, но hallucinated synthesis на сложных запросах. Agentic RAG: agent решает когда retrievать, делает multi-hop retrieval, верифицирует контекст, и может выполнять actions (API calls, DB updates). Ключевое отличие -- policy gates: agent перед необратимым действием проходит approval step. Enterprise default в 2026. Failure mode: retrieval drift + action risk -- fix через retrieval constraints (max 3 hops) + human approval на destructive actions. Cost: $0.01-0.10 за RAG + tool calling vs $0.001 за simple classification."

Q: Как проектировать multi-agent систему для production?

Red flag: "Просто создаём несколько агентов и даём им задачи."

Strong answer: "Treat multi-agent systems like microservices: contracts, schemas, timeouts, ownership. Четыре паттерна: Manager-Worker (most common) -- manager decompose задачу, workers execute, verifier проверяет. Router + Specialists -- для high-volume (triage, FAQ). Debate + Judge -- use sparingly, high-stakes only. Parallel Research + Synthesis -- high leverage для research tasks. Практически: limit crew size 2-5 agents (CrewAI benchmark), каждый agent с explicit role и tool set, shared state через checkpointer (LangGraph), step budget per agent (5-15), escalation rate target <10%, task success rate target >90%."

Q: Какие метрики обязательны для production agent workflow?

Red flag: "Отслеживаем только latency и количество ошибок."

Strong answer: "Minimum set: task success rate (target >90%), tool success rate (>95%), escalation rate (<10%), time-to-resolution, cost per task (model + tools + human time), grounding quality (citation accuracy, retrieval hit rate), safety metrics (policy violations, blocked actions). Отдельно: traces каждого шага и tool call, replay capability для reproduce incidents, regression tests (weekly against fixed suite), canary rollouts (1-5% traffic для новых prompts/models). Ключевая метрика -- cost per resolved workflow, не cost per token."

Источники¶

AiMatch Pro -- "AI Agent Workflows in 2025: The 2026 Playbook"
Vellum -- "The 2026 Guide to AI Agent Workflows"
Beam AI -- "The 9 Best Agentic Workflow Patterns in 2026"
Collabnix -- "Multi-Agent and Multi-LLM Architecture: Complete Guide for 2025"
IBM -- "What is Multimodal RAG?"
arXiv:2504.08748 -- "A Survey of Multimodal Retrieval-Augmented Generation"
arXiv:2510.09244 -- "Fundamentals of Building Autonomous LLM Agents"
Meta AI -- "Retrieval-Augmented Multimodal Language Modeling"
OpenAI/Anthropic -- Tool use documentation
Gartner -- Multi-agent system inquiry growth statistics