Продвинутые техники RAG¶

~10 минут чтения

URL: arXiv, Meilisearch, Neo4j, LeewayHertz Тип: advanced-rag / self-rag / crag / agentic-rag / graph-rag Дата: Февраль 2026 Сбор: Ralph Research ФАЗА 5

Предварительно: Стратегии чанкинга, RAG-техники и векторные БД

Зачем это нужно¶

Naive RAG (retrieve-generate) даёт recall@5 ~60-70%, Agentic RAG поднимает до 88-95%. Hybrid search (BM25 + dense + RRF fusion) добавляет +10-20% recall. Reranking (cross-encoder) -- ещё +15-30% precision. Self-RAG через reflection tokens учит модель self-evaluate retrieval quality. CRAG классифицирует результаты в три категории и откатывается на web search. Graph RAG повышает multi-hop accuracy с 45-60% до 75-85%. В 2026 насчитывается 14 типов RAG -- от naive до streaming и federated.

Ключевые концепции¶

Advanced RAG -- эволюция от статического pipeline к автономным агентным системам с self-correction.

Эволюция RAG¶

Поколение	Архитектура	Период
Naive RAG	Query -> Retrieve -> Generate	2020-2023
Advanced RAG	Hybrid search, reranking, chunking optimization	2023-2024
Self-Reflective RAG	Self-RAG, CRAG -- LLM evaluates retrieval quality	2024-2025
Agentic RAG	Autonomous agents, dynamic routing, self-correction loops	2025-2026

14 типов RAG (2026)¶

Тип	Описание	Use Case
Naive RAG	Simple vector search	Simple Q&A
Advanced RAG	Pre/post-retrieval optimization	Production
Modular RAG	Pluggable components	Enterprise
Graph RAG	Knowledge graph integration	Complex relationships
Agentic RAG	Dynamic strategy selection	Multi-step queries
Hybrid RAG	Vector + keyword + graph	General purpose
Multi-modal RAG	Text + image + audio	Rich media
Federated RAG	Multi-source retrieval	Distributed data
Adaptive RAG	Query-aware routing	Varied query types
Self-Reflective RAG	Self-correction loops	High accuracy
Speculative RAG	Parallel retrieval	Low latency
Cache-augmented RAG	Semantic caching	High QPS
Streaming RAG	Real-time updates	Live data
Hierarchical RAG	Multi-level retrieval	Large corpora

1. Hybrid Search¶

Проблемы чистого vector search¶

Проблема	Описание
Semantic drift	Близкие вектора != релевантный контент
OOV terms	Редкие слова, product names, коды
Precision loss	Broad matches, шум
Domain gaps	Generic embeddings не покрывают домен

Компоненты¶

Тип	Лучше для	Latency
Dense (vector)	Семантическое сходство, синонимы	5-20ms
Sparse (BM25)	Exact terms, IDs, keyword matching	1-5ms
Graph (traversal)	Relationships, multi-hop	10-50ms

Reciprocal Rank Fusion (RRF)¶

\[RRF(d) = \sum_{r \in R} \frac{1}{k + rank_r(d)}\]

\(k\) -- constant (typically 60), \(rank_r(d)\) -- rank документа в ranker \(r\).

Document	Vector Rank	BM25 Rank	RRF Score
Doc A	1	3	1/61 + 1/63 = 0.032
Doc B	2	1	1/62 + 1/61 = 0.032
Doc C	5	2	1/65 + 1/62 = 0.031

Параметр	Рекомендация
k value	60 (default)
Top-k per ranker	50-100 для fusion
Final top-k	10-20 для LLM context

2. Reranking¶

Подход	Скорость	Точность	Стоимость
Cross-encoder	Slow	High	High
ColBERT (late interaction)	Medium	High	Medium
LLM-as-reranker	Very slow	Highest	Highest
Learned scorer	Fast	Medium	Low

Cross-Encoder модели¶

Модель	Размер	Скорость	Качество
BGE-reranker-base	278M	Fast	Good
BGE-reranker-large	560M	Medium	Better
Cohere rerank	API	Fast	Excellent
Jina Reranker	140M	Very fast	Good

Multi-Stage Pipeline¶

Query -> Dense (100) -> BM25 (100) -> RRF Fusion (50) -> Cross-encoder (20) -> LLM

Stage	Input	Output	Purpose
Dense	Query	100 candidates	Semantic match
Sparse	Query	100 candidates	Lexical match
RRF	200	50 fused	Combine signals
Rerank	50	20 final	Precision boost

3. Query Enhancement & Chunking¶

Pre-Retrieval техники¶

Техника	Описание	Impact
Query expansion	Синонимы, related terms	+10-20% recall
Query rewriting	Rephrase for clarity	+5-15% precision
HyDE	Generate hypothetical doc, embed it	+10-25% recall
Multi-query	Generate multiple queries	+15-30% recall
Query routing	Route to best retriever	+20-40% relevance

HyDE (Hypothetical Document Embeddings)¶

LLM generates hypothetical answer
Embed hypothetical document
Search for similar real documents
Return real documents to LLM

Chunking Strategies¶

Стратегия	Размер	Лучше для
Fixed size	256-1024 tokens	General purpose
Semantic	Sentence/paragraph	Coherent content
Recursive	Nested chunks	Hierarchical docs
Sliding window	Overlapping	Context preservation
Parent-child	Small retrieve, large context	Detailed answers

4. Self-RAG (Self-Reflective RAG)¶

Модель учится решать, когда retrievать и оценивать качество retrieval через специальные reflection tokens.

Reflection Tokens¶

Token	Вопрос	Значения
RETRIEVE	Need more info?	[Retrieve] / [No Retrieve]
ISREL	Doc relevant?	[Relevant] / [Irrelevant]
ISSUP	Supports claim?	[Fully] / [Partially] / [No support]
ISUSE	Response useful?	[Utility:5] / [Utility:3] / [Utility:1]

Pipeline¶

Query -> [RETRIEVE?] -> Retrieve Docs -> [ISREL?] -> Generate
     -> [ISSUP?] -> [ISUSE?] -> Output (or retry)

Benchmarks¶

Model	PopQA	Bio	Pub	Fact	Avg
Standard RAG	42.2	48.4	58.1	61.9	52.7
Self-RAG (7B)	49.5	59.3	65.2	69.5	60.9
Self-RAG (13B)	53.0	62.5	68.4	72.1	64.0

class SelfRAGTokens:
    RETRIEVE = {"yes": "[Retrieve]", "no": "[No Retrieve]"}
    ISREL = {"relevant": "[Relevant]", "irrelevant": "[Irrelevant]"}
    ISSUP = {
        "fully_supported": "[Fully supported]",
        "partially_supported": "[Partially supported]",
        "no_support": "[No support]"
    }
    ISUSE = {"useful": "[Utility:5]", "somewhat": "[Utility:3]", "not_useful": "[Utility:1]"}

5. CRAG (Corrective RAG)¶

Evaluate retrieval results, correct when needed через three-way classification.

Классификация¶

Результат	Действие
Correct	Use retrieved docs
Incorrect	Discard, use web search
Ambiguous	Combine internal + external

Python Implementation¶

from enum import Enum
from dataclasses import dataclass

class RetrievalAction(Enum):
    USE_DOCS = "use_documents"
    WEB_SEARCH = "web_search"
    COMBINE = "combine_sources"

@dataclass
class RetrievalResult:
    action: RetrievalAction
    confidence: float
    reason: str

def evaluate_retrieval_crag(
    query: str,
    documents: list[str],
    threshold_correct: float = 0.7,
    threshold_ambiguous: float = 0.4
) -> RetrievalResult:
    score = llm_score(query, documents)  # 0-1

    if score >= threshold_correct:
        action = RetrievalAction.USE_DOCS
    elif score >= threshold_ambiguous:
        action = RetrievalAction.COMBINE
    else:
        action = RetrievalAction.WEB_SEARCH

    return RetrievalResult(action=action, confidence=score, reason="...")

6. Adaptive RAG¶

Dynamically select retrieval strategy based on query complexity.

Decision Tree¶

Query -> [Complexity?]
         Low    -> No RAG (direct LLM)
         Medium -> Single Retrieve
         High   -> Multi-hop + Rewrite

Dynamic-RAG (AAAI 2025)¶

Multi-Armed Bandit (MAB) для выбора стратегии:

\[\text{UCB}(a) = \bar{X}_a + \sqrt{\frac{2 \ln N}{n_a}}\]

\(\bar{X}_a\) = average reward for action \(a\), \(N\) = total queries, \(n_a\) = times action chosen.

RouteRAG¶

class RouteRAG:
    def route(self, query: str) -> str:
        features = self.extract_features(query)
        if features['is_factual']:
            return 'dense_retrieval'
        elif features['needs_reasoning']:
            return 'multi_hop_rag'
        elif features['is_creative']:
            return 'no_retrieval'
        else:
            return 'hybrid_rag'

7. Agentic RAG¶

Autonomous agents с full agentic design patterns: Reflection, Planning, Tool Use, Multi-Agent.

Architecture¶

Query -> Query Agent (intent, routing, decompose)
      -> Retrieval Agent (vector + graph + web, multi-source)
      -> Evaluation Agent (relevance, fact verification, gap identification)
      -> [Loop if gaps found]
      -> Generation Agent (synthesize, cite, format)
      -> Output

Agentic Patterns¶

Паттерн	Описание	Latency Impact
Single-hop	One retrieval step	Baseline
Multi-hop	Iterative retrieval	+200-500ms
Self-correction	Query refinement	+100-300ms
Tool use	External APIs	+100-1000ms
Parallel agents	Concurrent retrieval	+0ms (parallel)

Implementation¶

class AgenticRAG:
    def __init__(self):
        self.query_agent = QueryAgent()
        self.retrieval_agent = RetrievalAgent()
        self.evaluation_agent = EvaluationAgent()
        self.generation_agent = GenerationAgent()

    async def process(self, query: str, max_iterations: int = 3):
        query_plan = await self.query_agent.analyze(query)

        context = []
        for i in range(max_iterations):
            docs = await self.retrieval_agent.retrieve(query_plan)
            eval_result = await self.evaluation_agent.evaluate(query, docs, context)

            if eval_result.is_sufficient:
                break

            query_plan = await self.query_agent.refine(query_plan, eval_result.gaps)
            context.extend(docs)

        return await self.generation_agent.generate(query, context, citations=True)

LangGraph Implementation¶

from langgraph.graph import StateGraph, END

class RAGState(TypedDict):
    query: str
    documents: list
    score: float
    response: str
    iterations: int

def should_retry(state: RAGState) -> str:
    if state["score"] > 0.7 or state["iterations"] >= 3:
        return "generate"
    return "retrieve"

graph = StateGraph(RAGState)
graph.add_node("retrieve", retrieve)
graph.add_node("evaluate", evaluate)
graph.add_node("generate", generate)
graph.add_edge("retrieve", "evaluate")
graph.add_conditional_edges("evaluate", should_retry, {
    "retrieve": "retrieve", "generate": "generate"
})
graph.add_edge("generate", END)
graph.set_entry_point("retrieve")

8. Graph RAG¶

Когда использовать¶

Сценарий	Vector RAG	Graph RAG
Simple Q&A	OK	Overkill
Entity relationships	Слабо	OK
Multi-hop reasoning	Плохо	OK
Temporal queries	Нет	OK
Community detection	Нет	OK

Architecture¶

Document Corpus -> Text Chunks + Entity Extraction + Relation Extraction
                -> Vector Store + Knowledge Graph + Edge Store
                -> Hybrid Query Engine (vector search -> entry nodes,
                   graph traversal -> related context, merge -> ranked results)

Metrics¶

Метрика	Vector Only	Graph RAG	Improvement
Multi-hop accuracy	45-60%	75-85%	+25-35%
Entity recall	50-70%	85-95%	+20-30%
Context relevance	70-80%	85-92%	+12-15%

9. FAIR-RAG & RAG-Gym¶

FAIR-RAG (2025)¶

Structured Evidence Assessment (SEA): decompose query into required findings, retrieve and aggregate evidence, identify gaps, generate targeted sub-queries.

Method	HotpotQA F1
Standard RAG	0.370
Iterative RAG	0.370
FAIR-RAG	0.453 (+8.3 points)

RAG-Gym (2025)¶

Three optimization dimensions: Prompt Engineering (Re2Search), Actor Tuning (DPO, PPO, SFT), Critic Training (process supervision).

Re2Search: Reason -> Reflect -> Search -> Repeat until confident.

Agent	HotpotQA F1	2Wiki F1	MusiQue F1
Standard RAG	0.42	0.35	0.25
Search-R1	0.48	0.40	0.30
Re2Search	0.51	0.43	0.33
Re2Search++	0.55	0.47	0.37

10. Production Patterns & Evaluation¶

Default Architecture 2026¶

Query -> Query Enhancement -> Hybrid Search (Dense + Sparse)
     -> RRF Fusion -> Reranking -> Context Assembly -> LLM Generation

Monitoring Targets¶

Метрика	Target	Alert Threshold
Retrieval latency	<50ms	>100ms
Generation latency	<2s	>5s
Retrieval recall@5	>80%	<70%
Hallucination rate	<5%	>10%

Cost Optimization¶

Техника	Savings	Trade-off
Semantic caching	60-80%	Memory
Query routing	30-50%	Complexity
Chunk pruning	20-40%	Recall risk

Evaluation Metrics¶

Метрика	Описание	Target
Context Precision	Relevance of retrieved chunks	> 0.8
Context Recall	Coverage of needed information	> 0.7
Faithfulness	Answer grounded in context	> 0.9
Answer Relevance	Answer addresses query	> 0.8

Evaluation Tools¶

Tool	Тип	Фокус
RAGAS	Framework	Faithfulness, relevance
DeepEval	Framework	Comprehensive metrics
TruLens	Framework	Feedback functions
Arize Phoenix	Platform	Observability + eval

Для интервью¶

Q: "Что такое hybrid search и зачем нужен?"¶

Pure vector search страдает от semantic drift, OOV terms, precision loss. Hybrid = dense (semantic) + sparse/BM25 (exact terms). Combine через RRF: \(RRF(d) = \sum \frac{1}{k + rank_r(d)}\), k=60. Improvement: +10-20% recall vs dense only. Multi-stage pipeline: Dense (100) -> BM25 (100) -> RRF (50) -> Cross-encoder rerank (20) -> LLM. Reranking дает +15-30% precision. 2026: 95%+ enterprise adoption hybrid search.

Q: "Compare Self-RAG, CRAG, and Adaptive RAG."¶

Self-RAG: reflection tokens (RETRIEVE, ISREL, ISSUP, ISUSE) -- модель учится self-evaluate во время генерации. +10-15% accuracy vs baseline. Best for quality-critical applications. CRAG: 3-way classification (correct/incorrect/ambiguous) -- evaluates retrieval, falls back to web search when needed. +15-20% accuracy. Best for dynamic knowledge, verification. Adaptive RAG: query complexity routing (low -> no RAG, medium -> single retrieve, high -> multi-hop). Dynamic-RAG (AAAI 2025) uses MAB/UCB для выбора стратегии. +10-25% accuracy. Best for mixed query types, cost optimization.

Q: "Что такое Agentic RAG и его design patterns?"¶

Agentic RAG = 4 design patterns: (1) Reflection -- self-evaluate and improve. (2) Planning -- decompose complex queries. (3) Tool Use -- external APIs. (4) Multi-Agent -- specialized agents collaborate. Architecture: Query Agent (intent, routing) -> Retrieval Agent (multi-source: vector + graph + web) -> Evaluation Agent (relevance, gaps) -> Generation Agent (synthesize, cite). Iterative loop with max_iterations. Performance: 88-95% recall@5 (vs naive 60-70%). Cost: +2-5x vs vanilla. LangGraph для state management. 2026: 45% enterprise adoption.

Q: "Когда использовать Graph RAG?"¶

Entity relationships, multi-hop reasoning, temporal queries, hierarchical data. Multi-hop accuracy: vector only 45-60%, Graph RAG 75-85% (+25-35%). Entity recall: +20-30%. Architecture: vector search -> entry nodes, graph traversal -> related context. Storage: Neo4j, ArangoDB, FalkorDB. Overkill для simple Q&A. 2026: 25-30% adoption.

Q: "Спроектируйте production RAG систему."¶

(1) Query Enhancement: rewriting + HyDE (+10-25% recall) + routing (+20-40% relevance). (2) Hybrid retrieval: dense + sparse, RRF fusion. (3) Reranking: cross-encoder (BGE-reranker, Cohere, Jina) на top-50 -> top-20. (4) Context assembly + LLM generation with citations. (5) Evaluation: RAGAS (faithfulness >0.9, context precision >0.8). (6) Monitoring: retrieval latency <50ms, hallucination <5%. (7) Cost: semantic caching (60-80% savings), query routing (30-50%). Total latency: 700-2500ms.

Ключевые числа¶

Факт	Значение
Hybrid search improvement vs dense	+10-20% recall
Reranking improvement	+15-30% precision
Graph RAG multi-hop accuracy	+25-35% vs vector only
Self-RAG improvement (13B)	+11.3% avg vs standard RAG
CRAG accuracy improvement	+15-20%
Agentic RAG recall@5	88-95%
Naive RAG recall@5	60-70%
FAIR-RAG improvement (HotpotQA)	+8.3 F1
Re2Search++ (HotpotQA)	0.55 F1
HyDE recall improvement	+10-25%
Query routing relevance	+20-40%
Semantic caching savings	60-80%
RRF k constant (default)	60
Total pipeline latency	700-2500ms
Agentic RAG cost increase	+2-5x vs vanilla
Hybrid search adoption 2026	95%+
Agentic RAG adoption 2026	45%
Graph RAG adoption 2026	25-30%

Формулы¶

\[RRF(d) = \sum_{r \in R} \frac{1}{k + rank_r(d)} \quad \text{(Reciprocal Rank Fusion)}\]

\[\text{UCB}(a) = \bar{X}_a + \sqrt{\frac{2 \ln N}{n_a}} \quad \text{(Adaptive RAG method selection)}\]

\[Q = \alpha \cdot \text{Relevance} + \beta \cdot \text{Completeness} + \gamma \cdot \text{Accuracy} \quad \text{(Retrieval quality)}\]

\[\text{Coverage} = \frac{\text{Addressed Findings}}{\text{Required Findings}} \quad \text{(FAIR-RAG gap coverage)}\]

Частые ошибки

"Agentic RAG = просто добавить agent loop поверх RAG" -- Agentic RAG это архитектурный паттерн: query decomposition + tool selection + iterative retrieval + self-correction. Простой retry loop -- это не agentic.

"Graph RAG всегда лучше naive RAG" -- Graph RAG потребляет 10-100x токенов на ingestion (entity extraction). Для простых Q&A без multi-hop reasoning -- это overkill. Выигрыш только на entity-focused и relationship queries (45% -> 79%).

"Все 14 типов RAG нужно знать для интервью" -- Достаточно знать 5 ключевых: Naive, Self-RAG, CorrectiveRAG, Adaptive-RAG, Agentic RAG. Остальные -- вариации. Важнее понимать trade-offs (accuracy vs latency vs fine-tuning cost).

Источники¶

Asai et al. -- "Self-RAG: Learning to Retrieve, Generate, and Critique" (arXiv:2310.11511, 2023)
Yan et al. -- "Corrective RAG" (arXiv:2401.15884, 2024)
Jeong et al. -- "Adaptive RAG" (2024)
Dynamic-RAG -- "Multi-Armed Bandit Method Selection" (AAAI 2025)
FAIR-RAG -- "Structured Evidence Assessment" (arXiv:2510.22344, 2025)
RAG-Gym -- "Systematic RAG Optimization" (arXiv:2502.13957, 2025)
glaforge.dev -- "Advanced RAG -- Understanding RRF in Hybrid Search" (2026)
Meilisearch -- "14 Types of RAG" (2026)
Neo4j -- "Advanced RAG Techniques for High-Performance LLM Applications"
Techment -- "RAG in 2026: How Retrieval-Augmented Generation Works for Enterprise AI"
LeewayHertz -- "Advanced RAG: Architecture, Techniques, Applications"
Medium -- "Building Agentic RAG with LangGraph"

Продвинутые техники RAG¶

Зачем это нужно¶

Ключевые концепции¶

Эволюция RAG¶

14 типов RAG (2026)¶

1. Hybrid Search¶

Проблемы чистого vector search¶

Компоненты¶

Reciprocal Rank Fusion (RRF)¶

2. Reranking¶

Cross-Encoder модели¶

Multi-Stage Pipeline¶

3. Query Enhancement & Chunking¶

Pre-Retrieval техники¶

HyDE (Hypothetical Document Embeddings)¶

Chunking Strategies¶

4. Self-RAG (Self-Reflective RAG)¶

Reflection Tokens¶

Pipeline¶

Benchmarks¶

5. CRAG (Corrective RAG)¶

Классификация¶

Python Implementation¶

6. Adaptive RAG¶

Decision Tree¶

Dynamic-RAG (AAAI 2025)¶

RouteRAG¶

7. Agentic RAG¶

Architecture¶

Agentic Patterns¶

Implementation¶

LangGraph Implementation¶

8. Graph RAG¶

Когда использовать¶

Architecture¶

Metrics¶

9. FAIR-RAG & RAG-Gym¶

FAIR-RAG (2025)¶

RAG-Gym (2025)¶

10. Production Patterns & Evaluation¶

Default Architecture 2026¶

Monitoring Targets¶

Cost Optimization¶

Evaluation Metrics¶

Evaluation Tools¶

Для интервью¶

Q: "Что такое hybrid search и зачем нужен?"¶

Q: "Compare Self-RAG, CRAG, and Adaptive RAG."¶

Q: "Что такое Agentic RAG и его design patterns?"¶

Q: "Когда использовать Graph RAG?"¶

Q: "Спроектируйте production RAG систему."¶

Ключевые числа¶

Формулы¶

Источники¶

See Also¶