Перейти к содержанию

Продвинутые техники RAG

~10 минут чтения

URL: arXiv, Meilisearch, Neo4j, LeewayHertz Тип: advanced-rag / self-rag / crag / agentic-rag / graph-rag Дата: Февраль 2026 Сбор: Ralph Research ФАЗА 5


Предварительно: Стратегии чанкинга, RAG-техники и векторные БД

Зачем это нужно

Naive RAG (retrieve-generate) даёт recall@5 ~60-70%, Agentic RAG поднимает до 88-95%. Hybrid search (BM25 + dense + RRF fusion) добавляет +10-20% recall. Reranking (cross-encoder) -- ещё +15-30% precision. Self-RAG через reflection tokens учит модель self-evaluate retrieval quality. CRAG классифицирует результаты в три категории и откатывается на web search. Graph RAG повышает multi-hop accuracy с 45-60% до 75-85%. В 2026 насчитывается 14 типов RAG -- от naive до streaming и federated.


Ключевые концепции

Advanced RAG -- эволюция от статического pipeline к автономным агентным системам с self-correction.

Эволюция RAG

Поколение Архитектура Период
Naive RAG Query -> Retrieve -> Generate 2020-2023
Advanced RAG Hybrid search, reranking, chunking optimization 2023-2024
Self-Reflective RAG Self-RAG, CRAG -- LLM evaluates retrieval quality 2024-2025
Agentic RAG Autonomous agents, dynamic routing, self-correction loops 2025-2026

14 типов RAG (2026)

Тип Описание Use Case
Naive RAG Simple vector search Simple Q&A
Advanced RAG Pre/post-retrieval optimization Production
Modular RAG Pluggable components Enterprise
Graph RAG Knowledge graph integration Complex relationships
Agentic RAG Dynamic strategy selection Multi-step queries
Hybrid RAG Vector + keyword + graph General purpose
Multi-modal RAG Text + image + audio Rich media
Federated RAG Multi-source retrieval Distributed data
Adaptive RAG Query-aware routing Varied query types
Self-Reflective RAG Self-correction loops High accuracy
Speculative RAG Parallel retrieval Low latency
Cache-augmented RAG Semantic caching High QPS
Streaming RAG Real-time updates Live data
Hierarchical RAG Multi-level retrieval Large corpora

Проблема Описание
Semantic drift Близкие вектора != релевантный контент
OOV terms Редкие слова, product names, коды
Precision loss Broad matches, шум
Domain gaps Generic embeddings не покрывают домен

Компоненты

Тип Лучше для Latency
Dense (vector) Семантическое сходство, синонимы 5-20ms
Sparse (BM25) Exact terms, IDs, keyword matching 1-5ms
Graph (traversal) Relationships, multi-hop 10-50ms

Reciprocal Rank Fusion (RRF)

\[RRF(d) = \sum_{r \in R} \frac{1}{k + rank_r(d)}\]

\(k\) -- constant (typically 60), \(rank_r(d)\) -- rank документа в ranker \(r\).

Document Vector Rank BM25 Rank RRF Score
Doc A 1 3 1/61 + 1/63 = 0.032
Doc B 2 1 1/62 + 1/61 = 0.032
Doc C 5 2 1/65 + 1/62 = 0.031
Параметр Рекомендация
k value 60 (default)
Top-k per ranker 50-100 для fusion
Final top-k 10-20 для LLM context

2. Reranking

Подход Скорость Точность Стоимость
Cross-encoder Slow High High
ColBERT (late interaction) Medium High Medium
LLM-as-reranker Very slow Highest Highest
Learned scorer Fast Medium Low

Cross-Encoder модели

Модель Размер Скорость Качество
BGE-reranker-base 278M Fast Good
BGE-reranker-large 560M Medium Better
Cohere rerank API Fast Excellent
Jina Reranker 140M Very fast Good

Multi-Stage Pipeline

Query -> Dense (100) -> BM25 (100) -> RRF Fusion (50) -> Cross-encoder (20) -> LLM
Stage Input Output Purpose
Dense Query 100 candidates Semantic match
Sparse Query 100 candidates Lexical match
RRF 200 50 fused Combine signals
Rerank 50 20 final Precision boost

3. Query Enhancement & Chunking

Pre-Retrieval техники

Техника Описание Impact
Query expansion Синонимы, related terms +10-20% recall
Query rewriting Rephrase for clarity +5-15% precision
HyDE Generate hypothetical doc, embed it +10-25% recall
Multi-query Generate multiple queries +15-30% recall
Query routing Route to best retriever +20-40% relevance

HyDE (Hypothetical Document Embeddings)

  1. LLM generates hypothetical answer
  2. Embed hypothetical document
  3. Search for similar real documents
  4. Return real documents to LLM

Chunking Strategies

Стратегия Размер Лучше для
Fixed size 256-1024 tokens General purpose
Semantic Sentence/paragraph Coherent content
Recursive Nested chunks Hierarchical docs
Sliding window Overlapping Context preservation
Parent-child Small retrieve, large context Detailed answers

4. Self-RAG (Self-Reflective RAG)

Модель учится решать, когда retrievать и оценивать качество retrieval через специальные reflection tokens.

Reflection Tokens

Token Вопрос Значения
RETRIEVE Need more info? [Retrieve] / [No Retrieve]
ISREL Doc relevant? [Relevant] / [Irrelevant]
ISSUP Supports claim? [Fully] / [Partially] / [No support]
ISUSE Response useful? [Utility:5] / [Utility:3] / [Utility:1]

Pipeline

Query -> [RETRIEVE?] -> Retrieve Docs -> [ISREL?] -> Generate
     -> [ISSUP?] -> [ISUSE?] -> Output (or retry)

Benchmarks

Model PopQA Bio Pub Fact Avg
Standard RAG 42.2 48.4 58.1 61.9 52.7
Self-RAG (7B) 49.5 59.3 65.2 69.5 60.9
Self-RAG (13B) 53.0 62.5 68.4 72.1 64.0
class SelfRAGTokens:
    RETRIEVE = {"yes": "[Retrieve]", "no": "[No Retrieve]"}
    ISREL = {"relevant": "[Relevant]", "irrelevant": "[Irrelevant]"}
    ISSUP = {
        "fully_supported": "[Fully supported]",
        "partially_supported": "[Partially supported]",
        "no_support": "[No support]"
    }
    ISUSE = {"useful": "[Utility:5]", "somewhat": "[Utility:3]", "not_useful": "[Utility:1]"}

5. CRAG (Corrective RAG)

Evaluate retrieval results, correct when needed через three-way classification.

Классификация

Результат Действие
Correct Use retrieved docs
Incorrect Discard, use web search
Ambiguous Combine internal + external

Python Implementation

from enum import Enum
from dataclasses import dataclass

class RetrievalAction(Enum):
    USE_DOCS = "use_documents"
    WEB_SEARCH = "web_search"
    COMBINE = "combine_sources"

@dataclass
class RetrievalResult:
    action: RetrievalAction
    confidence: float
    reason: str

def evaluate_retrieval_crag(
    query: str,
    documents: list[str],
    threshold_correct: float = 0.7,
    threshold_ambiguous: float = 0.4
) -> RetrievalResult:
    score = llm_score(query, documents)  # 0-1

    if score >= threshold_correct:
        action = RetrievalAction.USE_DOCS
    elif score >= threshold_ambiguous:
        action = RetrievalAction.COMBINE
    else:
        action = RetrievalAction.WEB_SEARCH

    return RetrievalResult(action=action, confidence=score, reason="...")

6. Adaptive RAG

Dynamically select retrieval strategy based on query complexity.

Decision Tree

Query -> [Complexity?]
         Low    -> No RAG (direct LLM)
         Medium -> Single Retrieve
         High   -> Multi-hop + Rewrite

Dynamic-RAG (AAAI 2025)

Multi-Armed Bandit (MAB) для выбора стратегии:

\[\text{UCB}(a) = \bar{X}_a + \sqrt{\frac{2 \ln N}{n_a}}\]
  • \(\bar{X}_a\) = average reward for action \(a\), \(N\) = total queries, \(n_a\) = times action chosen.

RouteRAG

class RouteRAG:
    def route(self, query: str) -> str:
        features = self.extract_features(query)
        if features['is_factual']:
            return 'dense_retrieval'
        elif features['needs_reasoning']:
            return 'multi_hop_rag'
        elif features['is_creative']:
            return 'no_retrieval'
        else:
            return 'hybrid_rag'

7. Agentic RAG

Autonomous agents с full agentic design patterns: Reflection, Planning, Tool Use, Multi-Agent.

Architecture

Query -> Query Agent (intent, routing, decompose)
      -> Retrieval Agent (vector + graph + web, multi-source)
      -> Evaluation Agent (relevance, fact verification, gap identification)
      -> [Loop if gaps found]
      -> Generation Agent (synthesize, cite, format)
      -> Output

Agentic Patterns

Паттерн Описание Latency Impact
Single-hop One retrieval step Baseline
Multi-hop Iterative retrieval +200-500ms
Self-correction Query refinement +100-300ms
Tool use External APIs +100-1000ms
Parallel agents Concurrent retrieval +0ms (parallel)

Implementation

class AgenticRAG:
    def __init__(self):
        self.query_agent = QueryAgent()
        self.retrieval_agent = RetrievalAgent()
        self.evaluation_agent = EvaluationAgent()
        self.generation_agent = GenerationAgent()

    async def process(self, query: str, max_iterations: int = 3):
        query_plan = await self.query_agent.analyze(query)

        context = []
        for i in range(max_iterations):
            docs = await self.retrieval_agent.retrieve(query_plan)
            eval_result = await self.evaluation_agent.evaluate(query, docs, context)

            if eval_result.is_sufficient:
                break

            query_plan = await self.query_agent.refine(query_plan, eval_result.gaps)
            context.extend(docs)

        return await self.generation_agent.generate(query, context, citations=True)

LangGraph Implementation

from langgraph.graph import StateGraph, END

class RAGState(TypedDict):
    query: str
    documents: list
    score: float
    response: str
    iterations: int

def should_retry(state: RAGState) -> str:
    if state["score"] > 0.7 or state["iterations"] >= 3:
        return "generate"
    return "retrieve"

graph = StateGraph(RAGState)
graph.add_node("retrieve", retrieve)
graph.add_node("evaluate", evaluate)
graph.add_node("generate", generate)
graph.add_edge("retrieve", "evaluate")
graph.add_conditional_edges("evaluate", should_retry, {
    "retrieve": "retrieve", "generate": "generate"
})
graph.add_edge("generate", END)
graph.set_entry_point("retrieve")

8. Graph RAG

Когда использовать

Сценарий Vector RAG Graph RAG
Simple Q&A OK Overkill
Entity relationships Слабо OK
Multi-hop reasoning Плохо OK
Temporal queries Нет OK
Community detection Нет OK

Architecture

Document Corpus -> Text Chunks + Entity Extraction + Relation Extraction
                -> Vector Store + Knowledge Graph + Edge Store
                -> Hybrid Query Engine (vector search -> entry nodes,
                   graph traversal -> related context, merge -> ranked results)

Metrics

Метрика Vector Only Graph RAG Improvement
Multi-hop accuracy 45-60% 75-85% +25-35%
Entity recall 50-70% 85-95% +20-30%
Context relevance 70-80% 85-92% +12-15%

9. FAIR-RAG & RAG-Gym

FAIR-RAG (2025)

Structured Evidence Assessment (SEA): decompose query into required findings, retrieve and aggregate evidence, identify gaps, generate targeted sub-queries.

Method HotpotQA F1
Standard RAG 0.370
Iterative RAG 0.370
FAIR-RAG 0.453 (+8.3 points)

RAG-Gym (2025)

Three optimization dimensions: Prompt Engineering (Re2Search), Actor Tuning (DPO, PPO, SFT), Critic Training (process supervision).

Re2Search: Reason -> Reflect -> Search -> Repeat until confident.

Agent HotpotQA F1 2Wiki F1 MusiQue F1
Standard RAG 0.42 0.35 0.25
Search-R1 0.48 0.40 0.30
Re2Search 0.51 0.43 0.33
Re2Search++ 0.55 0.47 0.37

10. Production Patterns & Evaluation

Default Architecture 2026

Query -> Query Enhancement -> Hybrid Search (Dense + Sparse)
     -> RRF Fusion -> Reranking -> Context Assembly -> LLM Generation

Monitoring Targets

Метрика Target Alert Threshold
Retrieval latency <50ms >100ms
Generation latency <2s >5s
Retrieval recall@5 >80% <70%
Hallucination rate <5% >10%

Cost Optimization

Техника Savings Trade-off
Semantic caching 60-80% Memory
Query routing 30-50% Complexity
Chunk pruning 20-40% Recall risk

Evaluation Metrics

Метрика Описание Target
Context Precision Relevance of retrieved chunks > 0.8
Context Recall Coverage of needed information > 0.7
Faithfulness Answer grounded in context > 0.9
Answer Relevance Answer addresses query > 0.8

Evaluation Tools

Tool Тип Фокус
RAGAS Framework Faithfulness, relevance
DeepEval Framework Comprehensive metrics
TruLens Framework Feedback functions
Arize Phoenix Platform Observability + eval

Для интервью

Pure vector search страдает от semantic drift, OOV terms, precision loss. Hybrid = dense (semantic) + sparse/BM25 (exact terms). Combine через RRF: \(RRF(d) = \sum \frac{1}{k + rank_r(d)}\), k=60. Improvement: +10-20% recall vs dense only. Multi-stage pipeline: Dense (100) -> BM25 (100) -> RRF (50) -> Cross-encoder rerank (20) -> LLM. Reranking дает +15-30% precision. 2026: 95%+ enterprise adoption hybrid search.

Q: "Compare Self-RAG, CRAG, and Adaptive RAG."

Self-RAG: reflection tokens (RETRIEVE, ISREL, ISSUP, ISUSE) -- модель учится self-evaluate во время генерации. +10-15% accuracy vs baseline. Best for quality-critical applications. CRAG: 3-way classification (correct/incorrect/ambiguous) -- evaluates retrieval, falls back to web search when needed. +15-20% accuracy. Best for dynamic knowledge, verification. Adaptive RAG: query complexity routing (low -> no RAG, medium -> single retrieve, high -> multi-hop). Dynamic-RAG (AAAI 2025) uses MAB/UCB для выбора стратегии. +10-25% accuracy. Best for mixed query types, cost optimization.

Q: "Что такое Agentic RAG и его design patterns?"

Agentic RAG = 4 design patterns: (1) Reflection -- self-evaluate and improve. (2) Planning -- decompose complex queries. (3) Tool Use -- external APIs. (4) Multi-Agent -- specialized agents collaborate. Architecture: Query Agent (intent, routing) -> Retrieval Agent (multi-source: vector + graph + web) -> Evaluation Agent (relevance, gaps) -> Generation Agent (synthesize, cite). Iterative loop with max_iterations. Performance: 88-95% recall@5 (vs naive 60-70%). Cost: +2-5x vs vanilla. LangGraph для state management. 2026: 45% enterprise adoption.

Q: "Когда использовать Graph RAG?"

Entity relationships, multi-hop reasoning, temporal queries, hierarchical data. Multi-hop accuracy: vector only 45-60%, Graph RAG 75-85% (+25-35%). Entity recall: +20-30%. Architecture: vector search -> entry nodes, graph traversal -> related context. Storage: Neo4j, ArangoDB, FalkorDB. Overkill для simple Q&A. 2026: 25-30% adoption.

Q: "Спроектируйте production RAG систему."

(1) Query Enhancement: rewriting + HyDE (+10-25% recall) + routing (+20-40% relevance). (2) Hybrid retrieval: dense + sparse, RRF fusion. (3) Reranking: cross-encoder (BGE-reranker, Cohere, Jina) на top-50 -> top-20. (4) Context assembly + LLM generation with citations. (5) Evaluation: RAGAS (faithfulness >0.9, context precision >0.8). (6) Monitoring: retrieval latency <50ms, hallucination <5%. (7) Cost: semantic caching (60-80% savings), query routing (30-50%). Total latency: 700-2500ms.


Ключевые числа

Факт Значение
Hybrid search improvement vs dense +10-20% recall
Reranking improvement +15-30% precision
Graph RAG multi-hop accuracy +25-35% vs vector only
Self-RAG improvement (13B) +11.3% avg vs standard RAG
CRAG accuracy improvement +15-20%
Agentic RAG recall@5 88-95%
Naive RAG recall@5 60-70%
FAIR-RAG improvement (HotpotQA) +8.3 F1
Re2Search++ (HotpotQA) 0.55 F1
HyDE recall improvement +10-25%
Query routing relevance +20-40%
Semantic caching savings 60-80%
RRF k constant (default) 60
Total pipeline latency 700-2500ms
Agentic RAG cost increase +2-5x vs vanilla
Hybrid search adoption 2026 95%+
Agentic RAG adoption 2026 45%
Graph RAG adoption 2026 25-30%

Формулы

\[RRF(d) = \sum_{r \in R} \frac{1}{k + rank_r(d)} \quad \text{(Reciprocal Rank Fusion)}\]
\[\text{UCB}(a) = \bar{X}_a + \sqrt{\frac{2 \ln N}{n_a}} \quad \text{(Adaptive RAG method selection)}\]
\[Q = \alpha \cdot \text{Relevance} + \beta \cdot \text{Completeness} + \gamma \cdot \text{Accuracy} \quad \text{(Retrieval quality)}\]
\[\text{Coverage} = \frac{\text{Addressed Findings}}{\text{Required Findings}} \quad \text{(FAIR-RAG gap coverage)}\]

Частые ошибки

"Agentic RAG = просто добавить agent loop поверх RAG" -- Agentic RAG это архитектурный паттерн: query decomposition + tool selection + iterative retrieval + self-correction. Простой retry loop -- это не agentic.

"Graph RAG всегда лучше naive RAG" -- Graph RAG потребляет 10-100x токенов на ingestion (entity extraction). Для простых Q&A без multi-hop reasoning -- это overkill. Выигрыш только на entity-focused и relationship queries (45% -> 79%).

"Все 14 типов RAG нужно знать для интервью" -- Достаточно знать 5 ключевых: Naive, Self-RAG, CorrectiveRAG, Adaptive-RAG, Agentic RAG. Остальные -- вариации. Важнее понимать trade-offs (accuracy vs latency vs fine-tuning cost).

Источники

  1. Asai et al. -- "Self-RAG: Learning to Retrieve, Generate, and Critique" (arXiv:2310.11511, 2023)
  2. Yan et al. -- "Corrective RAG" (arXiv:2401.15884, 2024)
  3. Jeong et al. -- "Adaptive RAG" (2024)
  4. Dynamic-RAG -- "Multi-Armed Bandit Method Selection" (AAAI 2025)
  5. FAIR-RAG -- "Structured Evidence Assessment" (arXiv:2510.22344, 2025)
  6. RAG-Gym -- "Systematic RAG Optimization" (arXiv:2502.13957, 2025)
  7. glaforge.dev -- "Advanced RAG -- Understanding RRF in Hybrid Search" (2026)
  8. Meilisearch -- "14 Types of RAG" (2026)
  9. Neo4j -- "Advanced RAG Techniques for High-Performance LLM Applications"
  10. Techment -- "RAG in 2026: How Retrieval-Augmented Generation Works for Enterprise AI"
  11. LeewayHertz -- "Advanced RAG: Architecture, Techniques, Applications"
  12. Medium -- "Building Agentic RAG with LangGraph"

See Also