Перейти к содержанию

Function Calling & Tool Use 2025-2026

~5 минут чтения

LLM Tool Use, Function Calling, Multi-Agent Orchestration, ReAct Pattern Источники: ArXiv 2024-2025, OpenAI Docs, LangChain/AutoGen Guides


1. Function Calling Overview

1.1 Что такое Function Calling

Function Calling — механизм, позволяющий LLM отвечать структурированным JSON с именами функций и параметрами, что даёт возможность взаимодействовать с внешними системами.

graph LR
    Q["User Query"] --> LLM["LLM"]
    LLM --> FC["Function Call (JSON)"]
    FC --> EX["Executor"]
    EX --> RES["Function Result"]
    RES --> LLM
    LLM --> FINAL["Final Response"]
    style LLM fill:#e8eaf6,stroke:#3f51b5
    style EX fill:#fff3e0,stroke:#ef6c00
    style FINAL fill:#e8f5e9,stroke:#4caf50

1.2 Пример Function Call

# Function definition
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City and country, e.g., 'Paris, France'"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature unit"
                }
            },
            "required": ["location"]
        }
    }
}]

# LLM response with function call
response = {
    "tool_calls": [{
        "id": "call_abc123",
        "type": "function",
        "function": {
            "name": "get_weather",
            "arguments": '{"location": "Paris, France", "unit": "celsius"}'
        }
    }]
}

2. ReAct: Reasoning + Acting

2.1 ReAct Paradigm (Yao et al., 2022)

ReAct = Reasoning + Acting — фреймворк, чередующий явный chain-of-thought reasoning с выполнением external actions.

graph TD
    T1["Thought: I need to find<br/>the population of Paris"]
    T1 --> A1["Action: Search<br/>Paris population 2024"]
    A1 --> O1["Observation: Paris has<br/>2.1M inhabitants..."]
    O1 --> T2["Thought: Now I have<br/>the information needed"]
    T2 --> ANS["Answer: Paris has ~2.1M<br/>inhabitants"]
    style T1 fill:#e8eaf6,stroke:#3f51b5
    style A1 fill:#fff3e0,stroke:#ef6c00
    style O1 fill:#e8f5e9,stroke:#4caf50
    style T2 fill:#e8eaf6,stroke:#3f51b5
    style ANS fill:#fce4ec,stroke:#c62828

2.2 ReAct vs Chain-of-Thought

Approach Strengths Weaknesses
Chain-of-Thought Pure reasoning, no external access Hallucination, error propagation
Act-only Grounded in reality No reasoning trace
ReAct Best of both, interpretable More tokens, slower

2.3 ReAct Results (Original Paper)

Benchmark ReAct CoT Improvement
HotpotQA 27.4% 29.4% Better grounding
Fever 56.3% 56.7% Less hallucination
ALFWorld +34% baseline Interactive tasks
WebShop +10% baseline Decision making

3. Multi-LLM Agent Framework

3.1 Planner-Caller-Summarizer Pattern

Key Insight: Small LLMs struggle with all tool-use capabilities. Solution: Decompose into specialized roles.

graph LR
    P["PLANNER<br/>Task Planning"] --> C["CALLER<br/>Tool Invocation"]
    C --> S["SUMMARIZER<br/>Result Synthesis"]
    style P fill:#e8eaf6,stroke:#3f51b5
    style C fill:#fff3e0,stroke:#ef6c00
    style S fill:#e8f5e9,stroke:#4caf50

Each role = single LLM focused on one capability.

3.2 Two-Stage Training

Stage 1: Pre-train backbone on full dataset (comprehensive understanding) Stage 2: Specialize each component on respective sub-tasks

3.3 Results (Small LLMs Are Weak Tool Learners)

Approach Success Rate Efficiency
Single LLM (7B) 45% Low
Multi-LLM (3x 7B) 68% Higher
Single LLM (70B) 72% Highest cost

4. Agent Frameworks Comparison

4.1 Framework Overview (2025)

Framework Strength Best For
LangChain Swiss Army Knife Production apps
LangGraph State machines Complex workflows
AutoGen Multi-agent conversations Enterprise
CrewAI Role-based agents Team simulation
AG2 Latest AutoGen Modern patterns

4.2 LangChain vs AutoGen vs CrewAI

Feature LangChain AutoGen CrewAI
Primary Focus Tool orchestration Agent conversations Role-based teams
Learning Curve Medium Steep Low
Multi-Agent Via LangGraph Native Native
Human-in-Loop Supported Built-in Limited
Production Ready Yes Yes Growing

4.3 AutoGen Orchestration Patterns

Sequential Pattern:

# Task flows through agents in order
user_proxy  assistant  code_executor  reviewer  user_proxy

Hierarchical Pattern:

# Manager coordinates workers
manager
  ├── researcher_agent
  ├── coder_agent
  └── reviewer_agent


5. STRIDE: When to Use Agentic AI

5.1 Three Modalities

Modality Description When to Use
Direct LLM Call Single inference Static, simple tasks
Guided AI Assistant Structured help Semi-complex, predictable
Full Agentic AI Autonomous goal pursuit Dynamic, evolving context

5.2 STRIDE Framework Components

  1. Task Decomposition — Break down complexity
  2. Dynamism Attribution — How much does context change?
  3. Self-Reflection Requirement — Does task need iterative improvement?

5.3 Agentic Suitability Score

\[\text{Agentic Score} = w_1 \cdot \text{Complexity} + w_2 \cdot \text{Dynamism} + w_3 \cdot \text{Reflection}\]

Results: - 92% accuracy in modality selection - 45% reduction in unnecessary agent deployments - 37% cost reduction


6. Tool-Calling vs ReAct Agents

6.1 Architecture Comparison

Tool-Calling Agent:

User Query → Tool Selection → Tool Execution → Response
                 (Structured)

ReAct Agent:

User Query → Thought → Action → Observation → Thought → ... → Response
            (Chain of Reasoning)

6.2 When to Use Which

Scenario Recommended
Deterministic tool set Tool-Calling
Unknown tools at runtime ReAct
Need reasoning transparency ReAct
Fast execution required Tool-Calling
Complex multi-step tasks ReAct

7. Function Calling Best Practices

7.1 Schema Design

# Good function definition
{
    "name": "search_products",
    "description": "Search for products in the catalog. Use when user asks about availability, price, or features.",
    "parameters": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "Search query, e.g., 'wireless headphones under $100'"
            },
            "category": {
                "type": "string",
                "enum": ["electronics", "clothing", "home"],
                "description": "Product category to narrow search"
            },
            "max_price": {
                "type": "number",
                "description": "Maximum price in USD"
            }
        },
        "required": ["query"]
    }
}

7.2 Best Practices Summary

Practice Description
Clear names Use descriptive function names
Detailed descriptions Explain what, when, and how
Type constraints Use enums for limited options
Required params Mark essential parameters
Examples in descriptions Help LLM understand usage
Error handling Plan for failures

7.3 Common Pitfalls

  1. Vague descriptions → Wrong function selected
  2. Missing required params → Incomplete calls
  3. Too many functions → Selection confusion
  4. No error handling → Silent failures
  5. Overly complex schemas → Parameter errors

8. Tool Selection & Routing

8.1 Tool Selection Strategies

Strategy Description Use Case
LLM-based Model decides from descriptions Dynamic tool sets
Semantic search Embed query, find similar tools Large tool catalogs
Rule-based Keywords/patterns → tools Deterministic routing
Hybrid Combine approaches Production systems

8.2 Tool Routing Architecture

graph LR
    Q["Query"] --> IC["Intent Classification<br/>(Semantic Match)"]
    IC --> TS["Tool Selection<br/>(Rank by relevance)"]
    TS --> EX["Execute"]
    style IC fill:#e8eaf6,stroke:#3f51b5
    style TS fill:#fff3e0,stroke:#ef6c00
    style EX fill:#e8f5e9,stroke:#4caf50

9. Interview Questions

9.1 Concept Questions

Q: What is function calling in LLMs?

A: Function calling lets LLMs respond with structured JSON specifying
   function names and parameters, enabling interaction with external
   systems like APIs, databases, and tools.

Q: Explain the ReAct paradigm.

A: ReAct (Reasoning + Acting) alternates between:
   - Thought: Chain-of-thought reasoning
   - Action: External tool/environment interaction
   - Observation: Result from action

   Benefits: Better grounding, interpretable traces, error recovery

Q: When would you use agents vs direct LLM calls?

A: Use agents when:
   - Task requires multiple steps
   - Context changes dynamically
   - Self-reflection/iteration needed
   - External tools must be orchestrated

   Use direct calls when:
   - Single-shot tasks
   - Context is static
   - Speed is critical

9.2 Architecture Questions

Q: Design a multi-tool agent system.

A: Architecture:
   1. Query Understanding → Intent classification
   2. Tool Selection → Semantic search + LLM ranking
   3. Tool Execution → Parallel where possible
   4. Result Aggregation → Summarization LLM
   5. Response Generation → User-facing output

   Key considerations:
   - Tool schema management
   - Error handling & fallbacks
   - Rate limiting
   - Cost tracking

Q: Compare LangChain vs AutoGen for multi-agent systems.

A: LangChain:
   - Better for tool orchestration
   - LangGraph for state machines
   - More production-ready

   AutoGen:
   - Native multi-agent conversations
   - Built-in human-in-the-loop
   - Better for research/enterprise

9.3 Implementation Questions

Q: How do you handle tool call failures?

A: Strategies:
   1. Retry with modified parameters
   2. Fallback to alternative tools
   3. Ask user for clarification
   4. Return partial results with error info
   5. Log for improvement

   Implementation:
   - Timeout handling
   - Rate limit backoff
   - Circuit breaker pattern

Q: Optimize LLM agent latency.

A: Techniques:
   1. Plan reuse (AgentReuse: 93% latency reduction)
   2. Tool call caching
   3. Parallel tool execution
   4. Smaller models for routing
   5. Streaming responses
   6. Speculative execution


10. Key Papers & Resources

Paper/Resource Year Key Contribution
ReAct 2022 Reasoning + Acting paradigm
Small LLMs Are Weak Tool Learners 2024 Planner-Caller-Summarizer
AgentGuard 2025 Safety evaluation framework
STRIDE 2025 When to use agentic AI
AgentReuse 2025 93% latency reduction

11. Formulas

Agentic Suitability Score (STRIDE)

\[\text{Score} = w_1 \cdot \text{Complexity} + w_2 \cdot \text{Dynamism} + w_3 \cdot \text{Reflection}\]

Plan Reuse Speedup

\[\text{Speedup} = \frac{T_{\text{gen}}}{(1 - \alpha) T_{\text{gen}} + \alpha T_{\text{cache}}}\]

Where \(\alpha\) = cache hit rate

Tool Selection Score

\[\text{Score}(t, q) = \text{sim}(\text{embed}(t.\text{desc}), \text{embed}(q)) \cdot \text{relevance}(t)\]


See Also