Воркфлоу AI-агентов¶
~12 минут чтения
Agent workflow loop, agent stack (7 layers), 3 levels agentic behavior, 9 workflow patterns (ReAct, Plan-Execute, LATS, ReWOO, ToT, Reflection...), multi-agent orchestration (Manager-Worker, Router, Debate, Parallel), RAG vs Agentic RAG, multimodal RAG, tool calling reliability, frameworks (LangGraph, AutoGen, CrewAI), evaluation, production deployment (2025-2026)
Предварительно: LLM-агенты, Function Calling & Tool Use
Зачем это нужно¶
Agent workflows -- переход от разовых LLM-вызовов к полноценным процессным движкам. По данным Gartner, интерес к multi-agent системам вырос на 1445% с Q1 2024 по Q2 2025. К 2026 году 40% enterprise-приложений используют task-specific агентов (против <5% в 2025). Средний customer support agent снижает время ответа на 60%, а ChatDev-подобные multi-agent системы повышают точность кода на 67%. При этом стоимость одной multi-agent research-сессии составляет \(0.10--\)1.00 -- на порядки дешевле человеческого аналитика.
Ключевые концепции¶
Agent Workflow -- определение¶
An agent workflow is not "a chatbot." It's a workflow engine where LLMs decide the next step, use tools safely, and produce auditable outcomes.
2025-2026 Shift: cool demos -> operational software. 40% enterprise applications feature task-specific agents (vs <5% in 2025).
Три уровня Agentic Behavior¶
| Level | Type | Capability | Example |
|---|---|---|---|
| Level 1 | AI Workflow | Output decisions (generate based on prompts) | Q&A, summarization |
| Level 2 | Router Workflow | Task decisions (choose tools/paths) | Triage, routing |
| Level 3 | Autonomous Agent | Process decisions (create new tasks/tools) | Self-improving systems |
Agent Stack (7 Layers)¶
graph TD
L1["Interface Layer<br/>Chat, Email, Voice, Ticketing, Slack/Teams"]
L2["Orchestration Layer<br/>State Machine / Graph / Workflow Engine"]
L3["Reasoning + Planning<br/>Task decomposition, routing, role selection"]
L4["Tools<br/>Function calling, API actions, browser, code execution"]
L5["Knowledge<br/>RAG (Vector DB + documents), SQL, enterprise search"]
L6["Memory<br/>Short-term state + Long-term profiles/preferences"]
L7["Safety + Governance<br/>Policies, permissions, audit logs, redaction, approvals"]
L8["Observability + Evaluation<br/>Traces, metrics, eval harnesses, regression tests"]
L1 --> L2 --> L3 --> L4 --> L5 --> L6 --> L7 --> L8
style L1 fill:#e8eaf6,stroke:#3f51b5
style L2 fill:#e8f5e9,stroke:#4caf50
style L3 fill:#fff3e0,stroke:#ef6c00
style L4 fill:#f3e5f5,stroke:#9c27b0
style L5 fill:#e8eaf6,stroke:#3f51b5
style L6 fill:#e8f5e9,stroke:#4caf50
style L7 fill:#fce4ec,stroke:#c62828
style L8 fill:#fff3e0,stroke:#ef6c00
Agent Workflow Loop (7 Steps)¶
Goal -> Router -> Plan -> Retrieve -> Act (tools) -> Verify -> Output -> Log -> Human Escalation (if needed)
- Interprets goal -- understand user intent
- Plans -- break work into steps
- Uses tools -- APIs, databases, SaaS actions
- Retrieves context -- RAG, search, docs
- Writes outputs -- structured + natural language
- Verifies -- self-checks, tests, policy checks
- Escalates -- to human when confidence/risk thresholds fail
9 Agentic Workflow Patterns¶
1. ReAct (Reasoning + Acting)¶
| Best for | Watch out |
|---|---|
| Fast-moving tasks: triage, routing, support macros | Can loop endlessly without stop conditions |
2. Plan-and-Execute¶
1. Planner: Break goal into steps
2. Executor: Run each step sequentially
3. Verifier: Check outputs
| Best for | Watch out |
|---|---|
| Report generation, research summaries, data enrichment | Plans can be rigid when conditions change |
3. Planner-Critic-Executor¶
| Best for | Watch out |
|---|---|
| Contract drafting, financial reporting | Increased latency for critic step |
4. Reflection Loop¶
| Best for | Watch out |
|---|---|
| Writing, summarization, design recommendations | Extra reflection adds cost and latency |
5. Tree of Thoughts¶
+-- Branch A --+
| |
Root -> Thought +-- Branch B --+-- Converge -> Best Answer
| |
+-- Branch C --+
| Best for | Watch out |
|---|---|
| Creative/logical problem solving | Branching multiplies costs quickly |
6. LATS (Language Agent Tree Search)¶
MCTS-style search: select promising path -> expand with tool calls -> simulate outcomes -> backpropagate rewards.
| Best for | Watch out |
|---|---|
| Scenarios with real-time tool feedback | Success depends on strong scoring signals |
7. ReWOO (Reasoning Without Observation)¶
Externalizes reasoning by explicitly referencing tools and data in plan before execution.
| Best for | Watch out |
|---|---|
| Cases requiring explicit tool/data planning | More setup effort |
8. Router-Specialist Multi-Agent¶
| Best for | Watch out |
|---|---|
| Single entry point routing to domain experts | Incorrect routing causes cascading errors |
9. Debate/Consensus Multi-Agent¶
| Best for | Watch out |
|---|---|
| High-stakes: legal review, risk scoring | More time and compute |
Multi-Agent Orchestration Patterns¶
Pattern A: Manager-Worker (Most Common)¶
graph TD
M["Manager<br/>Interprets goal, decomposes, assigns"]
W1["Worker Agent 1"]
W2["Worker Agent 2"]
W3["Worker Agent 3"]
V["Verifier<br/>Checks outputs, policy, formatting"]
M --> W1 & W2 & W3
W1 & W2 & W3 --> V
style M fill:#e8eaf6,stroke:#3f51b5
style W1 fill:#e8f5e9,stroke:#4caf50
style W2 fill:#e8f5e9,stroke:#4caf50
style W3 fill:#e8f5e9,stroke:#4caf50
style V fill:#fff3e0,stroke:#ef6c00
Best for: Enterprise operations, analytics, customer support escalation.
Pattern B: Router + Specialists (Fast + Scalable)¶
Router classifies intent, selects one specialist. Avoids "committee chat" overhead.
Best for: High-volume tasks (triage, categorization, FAQ with actions).
Pattern C: Debate + Judge (Use Sparingly)¶
Two agents propose solutions; judge selects.
Best for: Legal review drafts, risk scoring, strategy docs.
Pattern D: Parallel Research + Synthesis (High Leverage)¶
graph TD
C["Coordinator"]
G1["Gather Source 1"]
G2["Gather Source 2"]
G3["Gather Source 3"]
S["Synthesizer<br/>Combines with citations"]
C --> G1 & G2 & G3
G1 & G2 & G3 --> S
style C fill:#e8eaf6,stroke:#3f51b5
style G1 fill:#e8f5e9,stroke:#4caf50
style G2 fill:#e8f5e9,stroke:#4caf50
style G3 fill:#e8f5e9,stroke:#4caf50
style S fill:#fff3e0,stroke:#ef6c00
Best for: Market research, competitive intel, policy updates.
Production insight: Multi-agent systems fail less when you treat them like microservices: contracts, schemas, timeouts, and ownership.
RAG vs Agent Workflows vs Agentic RAG¶
| Approach | Best For | Strengths | Failure Mode | Fix |
|---|---|---|---|---|
| Classic RAG | Q&A, policy lookup | Fast, cheap, explainable | Hallucinated synthesis | Better chunking, hybrid search |
| Single-Agent | Ticket triage, CRM updates | Simple orchestration | Tool misuse, looping | Guardrails, timeouts |
| Multi-Agent | Procurement, incident response | Specialization + parallel | Coordination overhead | Shared state, role contracts |
| Agentic RAG | Enterprise workflows | Higher task completion | Retrieval drift + action risk | Retrieval constraints + approvals |
2026 trend: "Agentic RAG" -- default enterprise pattern: retrieval grounds decisions, tools execute changes, policy gates manage risk.
Agentic RAG Architecture¶
Query -> Router -> Planner -> Retriever (RAG) -> Executor (Tools) -> Verifier -> Output
|
Policy Gates & Approvals
Multimodal RAG¶
Three Approaches¶
| Approach | Description | Pros | Cons |
|---|---|---|---|
| Text-translation | Convert images to captions, audio to transcripts | Easy integration | Information bottleneck |
| Text retrieval + multimodal generation | Retrieve via text, generate with original media | Better expressiveness | Retrieval still text-dependent |
| Multimodal retrieval | Cross-modal embeddings in shared vector space | Maximum grounding | Computationally expensive |
Pipeline¶
Step 1: Multimodal Knowledge Preparation
Images -> Vision encoder -> Embeddings
Audio -> Audio encoder -> Embeddings
Text -> Text encoder -> Embeddings
All stored in Vector DB
Step 2: Processing and Retrieval
Query -> Encode (any modality) -> Vector search -> Top-K multimodal chunks
Step 3: Multimodal Context Building
Early fusion: Convert all to text
Late fusion: Keep modalities separate
Multimodal LLM generates response
Use Cases¶
| Use Case | Benefit |
|---|---|
| Medical imaging | Retrieve similar cases with diagnosis |
| Product search | Search by image, get product details |
| Document analysis | Understand charts, tables, diagrams |
| Video QA | Answer questions about video content |
Tool Calling Reliability¶
What "Good" Looks Like¶
- Tools accept strict schemas (JSON Schema / typed models)
- Agents produce structured outputs for every action
- Every tool call is logged with inputs/outputs
- High-risk tools require human approval (HITL)
- Tools run in least-privilege mode (scoped tokens)
Tool Contract Example¶
from pydantic import BaseModel, Field
from typing import Literal, Optional
class CreateJiraTicket(BaseModel):
project_key: str = Field(..., description="Jira project key")
summary: str = Field(..., max_length=120)
description: str
severity: Literal["low", "medium", "high", "critical"]
requester_email: str
approval_required: bool = True
related_asset_id: Optional[str] = None
def create_jira_ticket(payload: CreateJiraTicket) -> dict:
# 1) Policy check (PII redaction, allowed project)
# 2) Call Jira API
# 3) Return structured receipt
return {"ticket_id": "ITOPS-1842", "status": "created"}
Guardrails¶
| Guardrail | Purpose |
|---|---|
| Allowlists | Limit tools, destinations, domains |
| Step budgets | Max turns, max tool calls |
| Deterministic formatting | Schemas + validators |
| Sandboxing | Code execution, browser actions |
| Confirmation prompts | Destructive actions (delete, refund, terminate) |
Production Workflow (9 Steps)¶
| Step | Action |
|---|---|
| 1 | Define the job -- one sentence goal + success criteria |
| 2 | Map the workflow -- states, transitions, failure paths, escalation |
| 3 | Choose tools -- APIs first; browser automation last |
| 4 | Add knowledge -- RAG with citations + freshness rules |
| 5 | Design memory -- store only what needed; set retention/redaction |
| 6 | Add policy gates -- permissions, approvals, audit logging, PII |
| 7 | Implement evals -- offline test set + adversarial cases |
| 8 | Ship with observability -- traces, tool metrics, cost, latency |
| 9 | Iterate -- tighten prompts, schemas, retrieval based on data |
Practical Examples¶
IT Incident Triage:
Ingest alert -> Classify -> Retrieve runbook (RAG) -> Propose actions -> Execute safe actions -> Open ticket -> Notify on-call
Finance Ops (Invoice Exceptions):
Read invoice -> Validate vendor + PO -> Check anomalies -> Request missing info -> Update ERP -> Audit trail
Sales Ops (Account Research):
Pull CRM context -> Research company news -> Draft personalized email -> Suggest next action -> Log activity
Best Practices¶
Human-in-the-Loop¶
| Practice | Implementation |
|---|---|
| Approvals | Required for irreversible actions |
| Diffs | Show what will change, not just explanations |
| Citations | "Why this action" + sources + tool summary |
| Escalation | One-click "escalate to human" at every stage |
Governance¶
| Requirement | Implementation |
|---|---|
| Least privilege | Tokens per agent + per tool |
| Central policy engine | Who can do what, where, when |
| Audit logs | Prompts, retrieval sources, tool I/O |
| PII handling | Detection + redaction before storage |
| Data residency | Region pinning, private deployments |
Cost Optimization¶
| Strategy | Impact |
|---|---|
| Smaller models | Routing, extraction, classification |
| Caching | Retrieval results with TTLs |
| Tool budgets | Limit calls, early stopping |
| Structured extraction | Over long-form generation |
| Track cost per resolved workflow | Not tokens |
Anti-Patterns to Avoid¶
- One Giant Prompt -- hides errors, hard to debug. Design modular patterns.
- No Evals, Only Vibes -- intuition doesn't scale. Build dashboards early.
- Tool Chaos -- untracked tools, unpinned model versions. Version everything.
Framework Comparison (2026)¶
| Framework | Type | Best For | Key Features |
|---|---|---|---|
| LangGraph | Graph-based | Stateful workflows, complex branching | Explicit multi-agent coordination, stateful, cycles |
| AutoGen | Conversational | Research, coding copilots | HITL, flexible agents, conversation management |
| CrewAI | Production teams | Business applications | Role-based agents, clean architecture |
| LangChain | Ecosystem | Maximum flexibility | Massive component library, extensive tooling |
| OpenAI Swarm | Lightweight | Prototyping | Routine-based, minimal overhead |
Selection Guide¶
| Need | Recommended |
|---|---|
| Complex state management | LangGraph |
| Research/coding copilots | AutoGen |
| Business workflows | CrewAI |
| Maximum flexibility | LangChain |
| Rapid prototyping | OpenAI Swarm |
| Hard orchestration | Temporal, Step Functions |
Evaluation and Monitoring¶
Metrics (Minimum Set)¶
| Metric | Description |
|---|---|
| Task success rate | Did it complete correctly? |
| Tool success rate | API errors, invalid schemas, retries |
| Escalation rate | How often humans intervene |
| Time-to-resolution | Latency end-to-end |
| Cost per task | Model + tools + human time |
| Grounding quality | Citation accuracy, retrieval hit rate |
| Safety metrics | Policy violations, blocked actions |
Observability Essentials¶
- Traces -- every step, tool call, retrieved doc ID
- Replay -- reproduce incidents with same state
- Regression tests -- weekly against fixed suite
- Canaries -- roll out new prompts/models to 1-5% first
Для интервью¶
Q: "Что такое agent workflow?"¶
Structured loop: LLM interprets goal -> plans (task decomposition) -> uses tools (API, DB) -> retrieves context (RAG) -> writes outputs -> verifies (self-checks + policy) -> escalates to human if confidence low. Agent Stack: 7 layers (Interface, Orchestration, Reasoning, Tools, Knowledge, Memory, Safety+Governance, Observability). 2026: 40% enterprise apps use agents.
Q: "Какие есть паттерны agent workflows?"¶
9 основных: (1) ReAct -- reasoning + action в малых шагах. (2) Plan-and-Execute -- strategic planning отдельно от execution. (3) Planner-Critic-Executor -- review перед execution. (4) Reflection Loop -- self-critique + refine. (5) Tree of Thoughts -- graph-based, multiple reasoning branches. (6) LATS -- MCTS-style search с tool feedback. (7) ReWOO -- explicit tool/data planning перед execution. (8) Router-Specialist -- route to domain experts. (9) Debate/Consensus -- multiple agents propose, judge decides.
Q: "Multi-agent orchestration patterns?"¶
(1) Manager-Worker (most common): manager decomposes, workers execute, verifier checks. (2) Router + Specialists: fast, avoids committee overhead. (3) Debate + Judge: high-stakes, use sparingly. (4) Parallel Research + Synthesis: multiple agents gather sources, one synthesizes with citations. Treat like microservices: contracts, schemas, timeouts, ownership.
Q: "RAG vs Agentic RAG?"¶
Classic RAG: Q&A over docs, fast, cheap. Agent Workflows: actions + decisions + tool use. Agentic RAG: retrieval grounds decisions + tools execute changes + policy gates manage risk. Default enterprise pattern in 2026.
Q: "Multimodal RAG?"¶
3 approaches: (1) Text-translation: convert images/audio to text. (2) Text retrieval + multimodal generation. (3) Full multimodal retrieval: cross-modal embeddings in shared vector space. Pipeline: prepare embeddings all modalities -> vector search -> multimodal LLM generates response.
Ключевые числа¶
| Факт | Значение |
|---|---|
| Enterprise agent adoption 2026 | 40% of applications |
| Multi-agent inquiry growth | 1,445% (Q1 2024 -> Q2 2025) |
| Enterprise AI workloads 2026 | 80%+ |
| Cross-validation accuracy improvement | 40% |
| ChatDev code accuracy improvement | 67% |
| Customer support response time reduction | 60% |
| Task success rate target | >90% |
| Escalation rate target | <10% |
| Tool success rate target | >95% |
| Cost per simple classification | \(0.001-\)0.01 |
| Cost per RAG + tool calling | \(0.01-\)0.10 |
| Cost per multi-agent research | \(0.10-\)1.00 |
| Retrieval latency target | <100ms |
| Re-ranking latency target | <50ms |
| Generation latency | 500-2000ms |
Заблуждение: Multi-agent всегда лучше single-agent
По данным benchmarks, single-agent с правильным набором tools решает 70-80% enterprise задач. Multi-agent добавляет 1.5-2.0x token overhead (AutoGen) и coordination latency. Используйте multi-agent только когда задача требует параллельной обработки или разных domain expertise -- остальное решается Router + tools.
Заблуждение: ReAct -- универсальный паттерн для всех агентов
ReAct отлично работает для fast-moving тасков (triage, routing), но на задачах требующих strategic planning (report generation, data enrichment) Plan-and-Execute паттерн показывает на 25-40% более высокий task success rate. ReAct без явных stop conditions может зацикливаться -- всегда устанавливайте step budget (обычно 5-15 шагов).
Заблуждение: Agentic RAG = обычный RAG + tools
Agentic RAG принципиально отличается: agent решает когда и что retrievать, может делать multi-hop retrieval и верифицировать retrieved context. Обычный RAG имеет retrieval drift при сложных запросах (accuracy падает на 30-40% на multi-hop questions). Agentic RAG добавляет policy gates и approval steps, что критично для enterprise -- без этого agent может выполнить необратимые действия на основе галлюцинированного контекста.
Вопросы для собеседования¶
Q: Как выбрать между 9 паттернами agent workflows для конкретной задачи?
Red flag: "Используем ReAct для всего, это самый популярный паттерн."
Strong answer: "Выбор зависит от характера задачи. Для fast-moving тасков (triage, routing, support macros) -- ReAct, потому что reasoning и action чередуются в малых шагах. Для задач требующих strategic planning (report generation, data enrichment) -- Plan-and-Execute, где planner строит план, executor последовательно выполняет. Для high-stakes (legal review, risk scoring) -- Debate/Consensus, где несколько агентов предлагают решения, judge выбирает. Tree of Thoughts -- для задач с ветвлением (5-20x дороже, но +30-150% на creative/logical solving). В production обычно комбинируют: Router на входе + специализированный паттерн по domain."
Q: Чем Agentic RAG отличается от обычного RAG и когда его использовать?
Red flag: "Agentic RAG -- это просто RAG с добавленными tools."
Strong answer: "Classic RAG: query -> retrieve -> generate. Быстро, дёшево, но hallucinated synthesis на сложных запросах. Agentic RAG: agent решает когда retrievать, делает multi-hop retrieval, верифицирует контекст, и может выполнять actions (API calls, DB updates). Ключевое отличие -- policy gates: agent перед необратимым действием проходит approval step. Enterprise default в 2026. Failure mode: retrieval drift + action risk -- fix через retrieval constraints (max 3 hops) + human approval на destructive actions. Cost: $0.01-0.10 за RAG + tool calling vs $0.001 за simple classification."
Q: Как проектировать multi-agent систему для production?
Red flag: "Просто создаём несколько агентов и даём им задачи."
Strong answer: "Treat multi-agent systems like microservices: contracts, schemas, timeouts, ownership. Четыре паттерна: Manager-Worker (most common) -- manager decompose задачу, workers execute, verifier проверяет. Router + Specialists -- для high-volume (triage, FAQ). Debate + Judge -- use sparingly, high-stakes only. Parallel Research + Synthesis -- high leverage для research tasks. Практически: limit crew size 2-5 agents (CrewAI benchmark), каждый agent с explicit role и tool set, shared state через checkpointer (LangGraph), step budget per agent (5-15), escalation rate target <10%, task success rate target >90%."
Q: Какие метрики обязательны для production agent workflow?
Red flag: "Отслеживаем только latency и количество ошибок."
Strong answer: "Minimum set: task success rate (target >90%), tool success rate (>95%), escalation rate (<10%), time-to-resolution, cost per task (model + tools + human time), grounding quality (citation accuracy, retrieval hit rate), safety metrics (policy violations, blocked actions). Отдельно: traces каждого шага и tool call, replay capability для reproduce incidents, regression tests (weekly against fixed suite), canary rollouts (1-5% traffic для новых prompts/models). Ключевая метрика -- cost per resolved workflow, не cost per token."
Источники¶
- AiMatch Pro -- "AI Agent Workflows in 2025: The 2026 Playbook"
- Vellum -- "The 2026 Guide to AI Agent Workflows"
- Beam AI -- "The 9 Best Agentic Workflow Patterns in 2026"
- Collabnix -- "Multi-Agent and Multi-LLM Architecture: Complete Guide for 2025"
- IBM -- "What is Multimodal RAG?"
- arXiv:2504.08748 -- "A Survey of Multimodal Retrieval-Augmented Generation"
- arXiv:2510.09244 -- "Fundamentals of Building Autonomous LLM Agents"
- Meta AI -- "Retrieval-Augmented Multimodal Language Modeling"
- OpenAI/Anthropic -- Tool use documentation
- Gartner -- Multi-agent system inquiry growth statistics