Function Calling & Tool Use 2025-2026¶

~5 минут чтения

LLM Tool Use, Function Calling, Multi-Agent Orchestration, ReAct Pattern Источники: ArXiv 2024-2025, OpenAI Docs, LangChain/AutoGen Guides

1. Function Calling Overview¶

1.1 Что такое Function Calling¶

Function Calling — механизм, позволяющий LLM отвечать структурированным JSON с именами функций и параметрами, что даёт возможность взаимодействовать с внешними системами.

graph LR
    Q["User Query"] --> LLM["LLM"]
    LLM --> FC["Function Call (JSON)"]
    FC --> EX["Executor"]
    EX --> RES["Function Result"]
    RES --> LLM
    LLM --> FINAL["Final Response"]
    style LLM fill:#e8eaf6,stroke:#3f51b5
    style EX fill:#fff3e0,stroke:#ef6c00
    style FINAL fill:#e8f5e9,stroke:#4caf50

1.2 Пример Function Call¶

# Function definition
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City and country, e.g., 'Paris, France'"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature unit"
                }
            },
            "required": ["location"]
        }
    }
}]

# LLM response with function call
response = {
    "tool_calls": [{
        "id": "call_abc123",
        "type": "function",
        "function": {
            "name": "get_weather",
            "arguments": '{"location": "Paris, France", "unit": "celsius"}'
        }
    }]
}

2. ReAct: Reasoning + Acting¶

2.1 ReAct Paradigm (Yao et al., 2022)¶

ReAct = Reasoning + Acting — фреймворк, чередующий явный chain-of-thought reasoning с выполнением external actions.

graph TD
    T1["Thought: I need to find<br/>the population of Paris"]
    T1 --> A1["Action: Search<br/>Paris population 2024"]
    A1 --> O1["Observation: Paris has<br/>2.1M inhabitants..."]
    O1 --> T2["Thought: Now I have<br/>the information needed"]
    T2 --> ANS["Answer: Paris has ~2.1M<br/>inhabitants"]
    style T1 fill:#e8eaf6,stroke:#3f51b5
    style A1 fill:#fff3e0,stroke:#ef6c00
    style O1 fill:#e8f5e9,stroke:#4caf50
    style T2 fill:#e8eaf6,stroke:#3f51b5
    style ANS fill:#fce4ec,stroke:#c62828

2.2 ReAct vs Chain-of-Thought¶

Approach	Strengths	Weaknesses
Chain-of-Thought	Pure reasoning, no external access	Hallucination, error propagation
Act-only	Grounded in reality	No reasoning trace
ReAct	Best of both, interpretable	More tokens, slower

2.3 ReAct Results (Original Paper)¶

Benchmark	ReAct	CoT	Improvement
HotpotQA	27.4%	29.4%	Better grounding
Fever	56.3%	56.7%	Less hallucination
ALFWorld	+34%	baseline	Interactive tasks
WebShop	+10%	baseline	Decision making

3. Multi-LLM Agent Framework¶

3.1 Planner-Caller-Summarizer Pattern¶

Key Insight: Small LLMs struggle with all tool-use capabilities. Solution: Decompose into specialized roles.

graph LR
    P["PLANNER<br/>Task Planning"] --> C["CALLER<br/>Tool Invocation"]
    C --> S["SUMMARIZER<br/>Result Synthesis"]
    style P fill:#e8eaf6,stroke:#3f51b5
    style C fill:#fff3e0,stroke:#ef6c00
    style S fill:#e8f5e9,stroke:#4caf50

Each role = single LLM focused on one capability.

3.2 Two-Stage Training¶

Stage 1: Pre-train backbone on full dataset (comprehensive understanding) Stage 2: Specialize each component on respective sub-tasks

3.3 Results (Small LLMs Are Weak Tool Learners)¶

Approach	Success Rate	Efficiency
Single LLM (7B)	45%	Low
Multi-LLM (3x 7B)	68%	Higher
Single LLM (70B)	72%	Highest cost

4. Agent Frameworks Comparison¶

4.1 Framework Overview (2025)¶

Framework	Strength	Best For
LangChain	Swiss Army Knife	Production apps
LangGraph	State machines	Complex workflows
AutoGen	Multi-agent conversations	Enterprise
CrewAI	Role-based agents	Team simulation
AG2	Latest AutoGen	Modern patterns

4.2 LangChain vs AutoGen vs CrewAI¶

Feature	LangChain	AutoGen	CrewAI
Primary Focus	Tool orchestration	Agent conversations	Role-based teams
Learning Curve	Medium	Steep	Low
Multi-Agent	Via LangGraph	Native	Native
Human-in-Loop	Supported	Built-in	Limited
Production Ready	Yes	Yes	Growing

4.3 AutoGen Orchestration Patterns¶

Sequential Pattern:

# Task flows through agents in order
user_proxy → assistant → code_executor → reviewer → user_proxy

Hierarchical Pattern:

# Manager coordinates workers
manager
  ├── researcher_agent
  ├── coder_agent
  └── reviewer_agent

5. STRIDE: When to Use Agentic AI¶

5.1 Three Modalities¶

Modality	Description	When to Use
Direct LLM Call	Single inference	Static, simple tasks
Guided AI Assistant	Structured help	Semi-complex, predictable
Full Agentic AI	Autonomous goal pursuit	Dynamic, evolving context

5.2 STRIDE Framework Components¶

Task Decomposition — Break down complexity
Dynamism Attribution — How much does context change?
Self-Reflection Requirement — Does task need iterative improvement?

5.3 Agentic Suitability Score¶

\[\text{Agentic Score} = w_1 \cdot \text{Complexity} + w_2 \cdot \text{Dynamism} + w_3 \cdot \text{Reflection}\]

Results: - 92% accuracy in modality selection - 45% reduction in unnecessary agent deployments - 37% cost reduction

6. Tool-Calling vs ReAct Agents¶

6.1 Architecture Comparison¶

Tool-Calling Agent:

User Query → Tool Selection → Tool Execution → Response
                 (Structured)

ReAct Agent:

User Query → Thought → Action → Observation → Thought → ... → Response
            (Chain of Reasoning)

6.2 When to Use Which¶

Scenario	Recommended
Deterministic tool set	Tool-Calling
Unknown tools at runtime	ReAct
Need reasoning transparency	ReAct
Fast execution required	Tool-Calling
Complex multi-step tasks	ReAct

7. Function Calling Best Practices¶

7.1 Schema Design¶

# Good function definition
{
    "name": "search_products",
    "description": "Search for products in the catalog. Use when user asks about availability, price, or features.",
    "parameters": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "Search query, e.g., 'wireless headphones under $100'"
            },
            "category": {
                "type": "string",
                "enum": ["electronics", "clothing", "home"],
                "description": "Product category to narrow search"
            },
            "max_price": {
                "type": "number",
                "description": "Maximum price in USD"
            }
        },
        "required": ["query"]
    }
}

7.2 Best Practices Summary¶

Practice	Description
Clear names	Use descriptive function names
Detailed descriptions	Explain what, when, and how
Type constraints	Use enums for limited options
Required params	Mark essential parameters
Examples in descriptions	Help LLM understand usage
Error handling	Plan for failures

7.3 Common Pitfalls¶

Vague descriptions → Wrong function selected
Missing required params → Incomplete calls
Too many functions → Selection confusion
No error handling → Silent failures
Overly complex schemas → Parameter errors

8. Tool Selection & Routing¶

8.1 Tool Selection Strategies¶

Strategy	Description	Use Case
LLM-based	Model decides from descriptions	Dynamic tool sets
Semantic search	Embed query, find similar tools	Large tool catalogs
Rule-based	Keywords/patterns → tools	Deterministic routing
Hybrid	Combine approaches	Production systems

8.2 Tool Routing Architecture¶

graph LR
    Q["Query"] --> IC["Intent Classification<br/>(Semantic Match)"]
    IC --> TS["Tool Selection<br/>(Rank by relevance)"]
    TS --> EX["Execute"]
    style IC fill:#e8eaf6,stroke:#3f51b5
    style TS fill:#fff3e0,stroke:#ef6c00
    style EX fill:#e8f5e9,stroke:#4caf50

9. Interview Questions¶

9.1 Concept Questions¶

Q: What is function calling in LLMs?

A: Function calling lets LLMs respond with structured JSON specifying
   function names and parameters, enabling interaction with external
   systems like APIs, databases, and tools.

Q: Explain the ReAct paradigm.

A: ReAct (Reasoning + Acting) alternates between:
   - Thought: Chain-of-thought reasoning
   - Action: External tool/environment interaction
   - Observation: Result from action

   Benefits: Better grounding, interpretable traces, error recovery

Q: When would you use agents vs direct LLM calls?

A: Use agents when:
   - Task requires multiple steps
   - Context changes dynamically
   - Self-reflection/iteration needed
   - External tools must be orchestrated

   Use direct calls when:
   - Single-shot tasks
   - Context is static
   - Speed is critical

9.2 Architecture Questions¶

Q: Design a multi-tool agent system.

A: Architecture:
   1. Query Understanding → Intent classification
   2. Tool Selection → Semantic search + LLM ranking
   3. Tool Execution → Parallel where possible
   4. Result Aggregation → Summarization LLM
   5. Response Generation → User-facing output

   Key considerations:
   - Tool schema management
   - Error handling & fallbacks
   - Rate limiting
   - Cost tracking

Q: Compare LangChain vs AutoGen for multi-agent systems.

A: LangChain:
   - Better for tool orchestration
   - LangGraph for state machines
   - More production-ready

   AutoGen:
   - Native multi-agent conversations
   - Built-in human-in-the-loop
   - Better for research/enterprise

9.3 Implementation Questions¶

Q: How do you handle tool call failures?

A: Strategies:
   1. Retry with modified parameters
   2. Fallback to alternative tools
   3. Ask user for clarification
   4. Return partial results with error info
   5. Log for improvement

   Implementation:
   - Timeout handling
   - Rate limit backoff
   - Circuit breaker pattern

Q: Optimize LLM agent latency.

A: Techniques:
   1. Plan reuse (AgentReuse: 93% latency reduction)
   2. Tool call caching
   3. Parallel tool execution
   4. Smaller models for routing
   5. Streaming responses
   6. Speculative execution

10. Key Papers & Resources¶

Paper/Resource	Year	Key Contribution
ReAct	2022	Reasoning + Acting paradigm
Small LLMs Are Weak Tool Learners	2024	Planner-Caller-Summarizer
AgentGuard	2025	Safety evaluation framework
STRIDE	2025	When to use agentic AI
AgentReuse	2025	93% latency reduction

11. Formulas¶

Agentic Suitability Score (STRIDE)¶

\[\text{Score} = w_1 \cdot \text{Complexity} + w_2 \cdot \text{Dynamism} + w_3 \cdot \text{Reflection}\]

Plan Reuse Speedup¶

\[\text{Speedup} = \frac{T_{\text{gen}}}{(1 - \alpha) T_{\text{gen}} + \alpha T_{\text{cache}}}\]

Where \(\alpha\) = cache hit rate

Tool Selection Score¶

\[\text{Score}(t, q) = \text{sim}(\text{embed}(t.\text{desc}), \text{embed}(q)) \cdot \text{relevance}(t)\]