LLM
RAG
Векторные БД
Продвинутые техники RAG
~10 минут чтения
URL: arXiv, Meilisearch, Neo4j, LeewayHertz
Тип: advanced-rag / self-rag / crag / agentic-rag / graph-rag
Дата: Февраль 2026
Сбор: Ralph Research ФАЗА 5
Предварительно: Стратегии чанкинга , RAG-техники и векторные БД
Зачем это нужно
Naive RAG (retrieve-generate) даёт recall@5 ~60-70%, Agentic RAG поднимает до 88-95%. Hybrid search (BM25 + dense + RRF fusion) добавляет +10-20% recall. Reranking (cross-encoder) -- ещё +15-30% precision. Self-RAG через reflection tokens учит модель self-evaluate retrieval quality. CRAG классифицирует результаты в три категории и откатывается на web search. Graph RAG повышает multi-hop accuracy с 45-60% до 75-85%. В 2026 насчитывается 14 типов RAG -- от naive до streaming и federated.
Ключевые концепции
Advanced RAG -- эволюция от статического pipeline к автономным агентным системам с self-correction.
Эволюция RAG
Поколение
Архитектура
Период
Naive RAG
Query -> Retrieve -> Generate
2020-2023
Advanced RAG
Hybrid search, reranking, chunking optimization
2023-2024
Self-Reflective RAG
Self-RAG, CRAG -- LLM evaluates retrieval quality
2024-2025
Agentic RAG
Autonomous agents, dynamic routing, self-correction loops
2025-2026
14 типов RAG (2026)
Тип
Описание
Use Case
Naive RAG
Simple vector search
Simple Q&A
Advanced RAG
Pre/post-retrieval optimization
Production
Modular RAG
Pluggable components
Enterprise
Graph RAG
Knowledge graph integration
Complex relationships
Agentic RAG
Dynamic strategy selection
Multi-step queries
Hybrid RAG
Vector + keyword + graph
General purpose
Multi-modal RAG
Text + image + audio
Rich media
Federated RAG
Multi-source retrieval
Distributed data
Adaptive RAG
Query-aware routing
Varied query types
Self-Reflective RAG
Self-correction loops
High accuracy
Speculative RAG
Parallel retrieval
Low latency
Cache-augmented RAG
Semantic caching
High QPS
Streaming RAG
Real-time updates
Live data
Hierarchical RAG
Multi-level retrieval
Large corpora
1. Hybrid Search
Проблемы чистого vector search
Проблема
Описание
Semantic drift
Близкие вектора != релевантный контент
OOV terms
Редкие слова, product names, коды
Precision loss
Broad matches, шум
Domain gaps
Generic embeddings не покрывают домен
Компоненты
Тип
Лучше для
Latency
Dense (vector)
Семантическое сходство, синонимы
5-20ms
Sparse (BM25)
Exact terms, IDs, keyword matching
1-5ms
Graph (traversal)
Relationships, multi-hop
10-50ms
Reciprocal Rank Fusion (RRF)
\[RRF(d) = \sum_{r \in R} \frac{1}{k + rank_r(d)}\]
\(k\) -- constant (typically 60), \(rank_r(d)\) -- rank документа в ranker \(r\) .
Document
Vector Rank
BM25 Rank
RRF Score
Doc A
1
3
1/61 + 1/63 = 0.032
Doc B
2
1
1/62 + 1/61 = 0.032
Doc C
5
2
1/65 + 1/62 = 0.031
Параметр
Рекомендация
k value
60 (default)
Top-k per ranker
50-100 для fusion
Final top-k
10-20 для LLM context
2. Reranking
Подход
Скорость
Точность
Стоимость
Cross-encoder
Slow
High
High
ColBERT (late interaction)
Medium
High
Medium
LLM-as-reranker
Very slow
Highest
Highest
Learned scorer
Fast
Medium
Low
Cross-Encoder модели
Модель
Размер
Скорость
Качество
BGE-reranker-base
278M
Fast
Good
BGE-reranker-large
560M
Medium
Better
Cohere rerank
API
Fast
Excellent
Jina Reranker
140M
Very fast
Good
Multi-Stage Pipeline
Query -> Dense (100) -> BM25 (100) -> RRF Fusion (50) -> Cross-encoder (20) -> LLM
Stage
Input
Output
Purpose
Dense
Query
100 candidates
Semantic match
Sparse
Query
100 candidates
Lexical match
RRF
200
50 fused
Combine signals
Rerank
50
20 final
Precision boost
3. Query Enhancement & Chunking
Pre-Retrieval техники
Техника
Описание
Impact
Query expansion
Синонимы, related terms
+10-20% recall
Query rewriting
Rephrase for clarity
+5-15% precision
HyDE
Generate hypothetical doc, embed it
+10-25% recall
Multi-query
Generate multiple queries
+15-30% recall
Query routing
Route to best retriever
+20-40% relevance
HyDE (Hypothetical Document Embeddings)
LLM generates hypothetical answer
Embed hypothetical document
Search for similar real documents
Return real documents to LLM
Chunking Strategies
Стратегия
Размер
Лучше для
Fixed size
256-1024 tokens
General purpose
Semantic
Sentence/paragraph
Coherent content
Recursive
Nested chunks
Hierarchical docs
Sliding window
Overlapping
Context preservation
Parent-child
Small retrieve, large context
Detailed answers
4. Self-RAG (Self-Reflective RAG)
Модель учится решать, когда retrievать и оценивать качество retrieval через специальные reflection tokens.
Reflection Tokens
Token
Вопрос
Значения
RETRIEVE
Need more info?
[Retrieve] / [No Retrieve]
ISREL
Doc relevant?
[Relevant] / [Irrelevant]
ISSUP
Supports claim?
[Fully] / [Partially] / [No support]
ISUSE
Response useful?
[Utility:5] / [Utility:3] / [Utility:1]
Pipeline
Query -> [RETRIEVE?] -> Retrieve Docs -> [ISREL?] -> Generate
-> [ISSUP?] -> [ISUSE?] -> Output (or retry)
Benchmarks
Model
PopQA
Bio
Pub
Fact
Avg
Standard RAG
42.2
48.4
58.1
61.9
52.7
Self-RAG (7B)
49.5
59.3
65.2
69.5
60.9
Self-RAG (13B)
53.0
62.5
68.4
72.1
64.0
class SelfRAGTokens :
RETRIEVE = { "yes" : "[Retrieve]" , "no" : "[No Retrieve]" }
ISREL = { "relevant" : "[Relevant]" , "irrelevant" : "[Irrelevant]" }
ISSUP = {
"fully_supported" : "[Fully supported]" ,
"partially_supported" : "[Partially supported]" ,
"no_support" : "[No support]"
}
ISUSE = { "useful" : "[Utility:5]" , "somewhat" : "[Utility:3]" , "not_useful" : "[Utility:1]" }
5. CRAG (Corrective RAG)
Evaluate retrieval results, correct when needed через three-way classification.
Классификация
Результат
Действие
Correct
Use retrieved docs
Incorrect
Discard, use web search
Ambiguous
Combine internal + external
Python Implementation
from enum import Enum
from dataclasses import dataclass
class RetrievalAction ( Enum ):
USE_DOCS = "use_documents"
WEB_SEARCH = "web_search"
COMBINE = "combine_sources"
@dataclass
class RetrievalResult :
action : RetrievalAction
confidence : float
reason : str
def evaluate_retrieval_crag (
query : str ,
documents : list [ str ],
threshold_correct : float = 0.7 ,
threshold_ambiguous : float = 0.4
) -> RetrievalResult :
score = llm_score ( query , documents ) # 0-1
if score >= threshold_correct :
action = RetrievalAction . USE_DOCS
elif score >= threshold_ambiguous :
action = RetrievalAction . COMBINE
else :
action = RetrievalAction . WEB_SEARCH
return RetrievalResult ( action = action , confidence = score , reason = "..." )
6. Adaptive RAG
Dynamically select retrieval strategy based on query complexity.
Decision Tree
Query -> [Complexity?]
Low -> No RAG (direct LLM)
Medium -> Single Retrieve
High -> Multi-hop + Rewrite
Dynamic-RAG (AAAI 2025)
Multi-Armed Bandit (MAB) для выбора стратегии:
\[\text{UCB}(a) = \bar{X}_a + \sqrt{\frac{2 \ln N}{n_a}}\]
\(\bar{X}_a\) = average reward for action \(a\) , \(N\) = total queries, \(n_a\) = times action chosen.
RouteRAG
class RouteRAG :
def route ( self , query : str ) -> str :
features = self . extract_features ( query )
if features [ 'is_factual' ]:
return 'dense_retrieval'
elif features [ 'needs_reasoning' ]:
return 'multi_hop_rag'
elif features [ 'is_creative' ]:
return 'no_retrieval'
else :
return 'hybrid_rag'
7. Agentic RAG
Autonomous agents с full agentic design patterns: Reflection , Planning , Tool Use , Multi-Agent .
Architecture
Query -> Query Agent (intent, routing, decompose)
-> Retrieval Agent (vector + graph + web, multi-source)
-> Evaluation Agent (relevance, fact verification, gap identification)
-> [Loop if gaps found]
-> Generation Agent (synthesize, cite, format)
-> Output
Agentic Patterns
Паттерн
Описание
Latency Impact
Single-hop
One retrieval step
Baseline
Multi-hop
Iterative retrieval
+200-500ms
Self-correction
Query refinement
+100-300ms
Tool use
External APIs
+100-1000ms
Parallel agents
Concurrent retrieval
+0ms (parallel)
Implementation
class AgenticRAG :
def __init__ ( self ):
self . query_agent = QueryAgent ()
self . retrieval_agent = RetrievalAgent ()
self . evaluation_agent = EvaluationAgent ()
self . generation_agent = GenerationAgent ()
async def process ( self , query : str , max_iterations : int = 3 ):
query_plan = await self . query_agent . analyze ( query )
context = []
for i in range ( max_iterations ):
docs = await self . retrieval_agent . retrieve ( query_plan )
eval_result = await self . evaluation_agent . evaluate ( query , docs , context )
if eval_result . is_sufficient :
break
query_plan = await self . query_agent . refine ( query_plan , eval_result . gaps )
context . extend ( docs )
return await self . generation_agent . generate ( query , context , citations = True )
LangGraph Implementation
from langgraph.graph import StateGraph , END
class RAGState ( TypedDict ):
query : str
documents : list
score : float
response : str
iterations : int
def should_retry ( state : RAGState ) -> str :
if state [ "score" ] > 0.7 or state [ "iterations" ] >= 3 :
return "generate"
return "retrieve"
graph = StateGraph ( RAGState )
graph . add_node ( "retrieve" , retrieve )
graph . add_node ( "evaluate" , evaluate )
graph . add_node ( "generate" , generate )
graph . add_edge ( "retrieve" , "evaluate" )
graph . add_conditional_edges ( "evaluate" , should_retry , {
"retrieve" : "retrieve" , "generate" : "generate"
})
graph . add_edge ( "generate" , END )
graph . set_entry_point ( "retrieve" )
8. Graph RAG
Когда использовать
Сценарий
Vector RAG
Graph RAG
Simple Q&A
OK
Overkill
Entity relationships
Слабо
OK
Multi-hop reasoning
Плохо
OK
Temporal queries
Нет
OK
Community detection
Нет
OK
Architecture
Document Corpus -> Text Chunks + Entity Extraction + Relation Extraction
-> Vector Store + Knowledge Graph + Edge Store
-> Hybrid Query Engine (vector search -> entry nodes,
graph traversal -> related context, merge -> ranked results)
Metrics
Метрика
Vector Only
Graph RAG
Improvement
Multi-hop accuracy
45-60%
75-85%
+25-35%
Entity recall
50-70%
85-95%
+20-30%
Context relevance
70-80%
85-92%
+12-15%
9. FAIR-RAG & RAG-Gym
FAIR-RAG (2025)
Structured Evidence Assessment (SEA): decompose query into required findings, retrieve and aggregate evidence, identify gaps, generate targeted sub-queries.
Method
HotpotQA F1
Standard RAG
0.370
Iterative RAG
0.370
FAIR-RAG
0.453 (+8.3 points)
RAG-Gym (2025)
Three optimization dimensions: Prompt Engineering (Re2Search), Actor Tuning (DPO, PPO, SFT), Critic Training (process supervision).
Re2Search: Reason -> Reflect -> Search -> Repeat until confident.
Agent
HotpotQA F1
2Wiki F1
MusiQue F1
Standard RAG
0.42
0.35
0.25
Search-R1
0.48
0.40
0.30
Re2Search
0.51
0.43
0.33
Re2Search++
0.55
0.47
0.37
10. Production Patterns & Evaluation
Default Architecture 2026
Query -> Query Enhancement -> Hybrid Search (Dense + Sparse)
-> RRF Fusion -> Reranking -> Context Assembly -> LLM Generation
Monitoring Targets
Метрика
Target
Alert Threshold
Retrieval latency
<50ms
>100ms
Generation latency
<2s
>5s
Retrieval recall@5
>80%
<70%
Hallucination rate
<5%
>10%
Cost Optimization
Техника
Savings
Trade-off
Semantic caching
60-80%
Memory
Query routing
30-50%
Complexity
Chunk pruning
20-40%
Recall risk
Evaluation Metrics
Метрика
Описание
Target
Context Precision
Relevance of retrieved chunks
> 0.8
Context Recall
Coverage of needed information
> 0.7
Faithfulness
Answer grounded in context
> 0.9
Answer Relevance
Answer addresses query
> 0.8
Tool
Тип
Фокус
RAGAS
Framework
Faithfulness, relevance
DeepEval
Framework
Comprehensive metrics
TruLens
Framework
Feedback functions
Arize Phoenix
Platform
Observability + eval
Для интервью
Q: "Что такое hybrid search и зачем нужен?"
Pure vector search страдает от semantic drift, OOV terms, precision loss. Hybrid = dense (semantic) + sparse/BM25 (exact terms). Combine через RRF: \(RRF(d) = \sum \frac{1}{k + rank_r(d)}\) , k=60. Improvement: +10-20% recall vs dense only. Multi-stage pipeline: Dense (100) -> BM25 (100) -> RRF (50) -> Cross-encoder rerank (20) -> LLM. Reranking дает +15-30% precision. 2026: 95%+ enterprise adoption hybrid search.
Q: "Compare Self-RAG, CRAG, and Adaptive RAG."
Self-RAG: reflection tokens (RETRIEVE, ISREL, ISSUP, ISUSE) -- модель учится self-evaluate во время генерации. +10-15% accuracy vs baseline. Best for quality-critical applications. CRAG: 3-way classification (correct/incorrect/ambiguous) -- evaluates retrieval, falls back to web search when needed. +15-20% accuracy. Best for dynamic knowledge, verification. Adaptive RAG: query complexity routing (low -> no RAG, medium -> single retrieve, high -> multi-hop). Dynamic-RAG (AAAI 2025) uses MAB/UCB для выбора стратегии. +10-25% accuracy. Best for mixed query types, cost optimization.
Q: "Что такое Agentic RAG и его design patterns?"
Agentic RAG = 4 design patterns: (1) Reflection -- self-evaluate and improve. (2) Planning -- decompose complex queries. (3) Tool Use -- external APIs. (4) Multi-Agent -- specialized agents collaborate. Architecture: Query Agent (intent, routing) -> Retrieval Agent (multi-source: vector + graph + web) -> Evaluation Agent (relevance, gaps) -> Generation Agent (synthesize, cite). Iterative loop with max_iterations. Performance: 88-95% recall@5 (vs naive 60-70%). Cost: +2-5x vs vanilla. LangGraph для state management. 2026: 45% enterprise adoption.
Q: "Когда использовать Graph RAG?"
Entity relationships, multi-hop reasoning, temporal queries, hierarchical data. Multi-hop accuracy: vector only 45-60%, Graph RAG 75-85% (+25-35%). Entity recall: +20-30%. Architecture: vector search -> entry nodes, graph traversal -> related context. Storage: Neo4j, ArangoDB, FalkorDB. Overkill для simple Q&A. 2026: 25-30% adoption.
Q: "Спроектируйте production RAG систему."
(1) Query Enhancement: rewriting + HyDE (+10-25% recall) + routing (+20-40% relevance). (2) Hybrid retrieval: dense + sparse, RRF fusion. (3) Reranking: cross-encoder (BGE-reranker, Cohere, Jina) на top-50 -> top-20. (4) Context assembly + LLM generation with citations. (5) Evaluation: RAGAS (faithfulness >0.9, context precision >0.8). (6) Monitoring: retrieval latency <50ms, hallucination <5%. (7) Cost: semantic caching (60-80% savings), query routing (30-50%). Total latency: 700-2500ms.
Ключевые числа
Факт
Значение
Hybrid search improvement vs dense
+10-20% recall
Reranking improvement
+15-30% precision
Graph RAG multi-hop accuracy
+25-35% vs vector only
Self-RAG improvement (13B)
+11.3% avg vs standard RAG
CRAG accuracy improvement
+15-20%
Agentic RAG recall@5
88-95%
Naive RAG recall@5
60-70%
FAIR-RAG improvement (HotpotQA)
+8.3 F1
Re2Search++ (HotpotQA)
0.55 F1
HyDE recall improvement
+10-25%
Query routing relevance
+20-40%
Semantic caching savings
60-80%
RRF k constant (default)
60
Total pipeline latency
700-2500ms
Agentic RAG cost increase
+2-5x vs vanilla
Hybrid search adoption 2026
95%+
Agentic RAG adoption 2026
45%
Graph RAG adoption 2026
25-30%
Формулы
\[RRF(d) = \sum_{r \in R} \frac{1}{k + rank_r(d)} \quad \text{(Reciprocal Rank Fusion)}\]
\[\text{UCB}(a) = \bar{X}_a + \sqrt{\frac{2 \ln N}{n_a}} \quad \text{(Adaptive RAG method selection)}\]
\[Q = \alpha \cdot \text{Relevance} + \beta \cdot \text{Completeness} + \gamma \cdot \text{Accuracy} \quad \text{(Retrieval quality)}\]
\[\text{Coverage} = \frac{\text{Addressed Findings}}{\text{Required Findings}} \quad \text{(FAIR-RAG gap coverage)}\]
Частые ошибки
"Agentic RAG = просто добавить agent loop поверх RAG" -- Agentic RAG это архитектурный паттерн: query decomposition + tool selection + iterative retrieval + self-correction. Простой retry loop -- это не agentic.
"Graph RAG всегда лучше naive RAG" -- Graph RAG потребляет 10-100x токенов на ingestion (entity extraction). Для простых Q&A без multi-hop reasoning -- это overkill. Выигрыш только на entity-focused и relationship queries (45% -> 79%).
"Все 14 типов RAG нужно знать для интервью" -- Достаточно знать 5 ключевых: Naive, Self-RAG, CorrectiveRAG, Adaptive-RAG, Agentic RAG. Остальные -- вариации. Важнее понимать trade-offs (accuracy vs latency vs fine-tuning cost).
Источники
Asai et al. -- "Self-RAG: Learning to Retrieve, Generate, and Critique" (arXiv:2310.11511, 2023)
Yan et al. -- "Corrective RAG" (arXiv:2401.15884, 2024)
Jeong et al. -- "Adaptive RAG" (2024)
Dynamic-RAG -- "Multi-Armed Bandit Method Selection" (AAAI 2025)
FAIR-RAG -- "Structured Evidence Assessment" (arXiv:2510.22344, 2025)
RAG-Gym -- "Systematic RAG Optimization" (arXiv:2502.13957, 2025)
glaforge.dev -- "Advanced RAG -- Understanding RRF in Hybrid Search" (2026)
Meilisearch -- "14 Types of RAG" (2026)
Neo4j -- "Advanced RAG Techniques for High-Performance LLM Applications"
Techment -- "RAG in 2026: How Retrieval-Augmented Generation Works for Enterprise AI"
LeewayHertz -- "Advanced RAG: Architecture, Techniques, Applications"
Medium -- "Building Agentic RAG with LangGraph"
See Also
21 февраля 2026 г.
21 февраля 2026 г.