Пробелы в покрытии LLM Engineering¶

~18 минут чтения

Предварительно: Материалы | Подготовка к интервью

Анализ 50+ реальных LLM Engineer собеседований (2025-2026) показал, что базовые 12 задач покрывают ~75% вопросов, но оставшиеся 25% -- именно те, которые отличают Mid от Senior кандидатов. Ключевые gaps: model architecture deep dive (GQA, MoE routing -- спрашивают в 60% интервью), agentic systems (ReAct, multi-agent -- 45%), long context handling (RoPE scaling -- 40%), и evaluation frameworks (RAGAS, LLM-as-judge -- 35%). Этот документ отслеживает каждый gap с текущим статусом заполнения и приоритетами.

Что спрашивают на собеседованиях, чего НЕТ в 12 задачах Недопокрытые темы для AI/ML/LLM Engineer Обновлено: 2026-02-11

Текущее покрытие (12 задач)¶

Подкатегория	Задач	Покрытие
Tokenization & Decoding	2	Хорошее
Prompt Engineering	1	Базовое
RAG Pipeline	2	Хорошее
LoRA & Fine-tuning	3	Отличное
Quantization	1	Хорошее
Alignment (RLHF/DPO)	1	Хорошее
Hallucination	1	Базовое
Production Security	1	Хорошее

КРИТИЧЕСКИЕ GAPS¶

1. Model Architecture Deep Dive (НЕТ в tasks, но ПОКРЫТО в sources)¶

Что спрашивают: - Transformer internals (encoder vs decoder) - Attention variants (MHA, MQA, GQA, MLA) - MoE (Mixture of Experts) routing - State Space Models (Mamba, SSM)

Пример вопроса:

"Объясните разницу между Multi-Head Attention и Grouped-Query Attention"

Рекомендация: Добавить llm_012_architecture_deep_dive

Что спрашивают: - Vision-Language Models (CLIP, LLaVA) - Image tokenization - Cross-modal attention - Audio models (Whisper architecture)

Пример вопроса:

"Как работает LLaVA? Как image tokens взаимодействуют с text tokens?"

Рекомендация: Добавить llm_013_multimodal

3. Agentic Systems — ЧАСТИЧНО ЗАПОЛНЕНО¶

Добавлено в материалы.md section 15: - Agent definition (Anthropic, OpenAI, LangChain consensus) - ReAct pattern (Think-Act-Observe loop) with vanilla implementation - LangGraph ReAct agent (Nodes, Edges, State, Reducers) - Multi-Agent System (MAS) structures comparison (Network, Supervisor, Hierarchical) - Why Multi-Agent: Error checking, Specialization, Model diversity - Hierarchical MAS implementation with Python code (make_supervisor_node, handoff pattern) - Agent vs Agentic Workflow comparison - Best practices (2025-2026) - Interview questions (4 Q&A)

Источники: Dylan Castillo "Building ReAct Agents" (July 2025), S Sankar "Multi-Agent Systems with LangGraph" (Nov 2025), Anthropic Engineering

Осталось: - Отдельная практическая задача (ContentBlock) - Memory systems for agents - Human-in-the-loop patterns

4. Long Context Handling — ЧАСТИЧНО ЗАПОЛНЕНО¶

Добавлено в материалы.md section 16: - The Problem: Training vs Inference mismatch (positions beyond training = unseen rotation angles) - RoPE formulation (2D rotation matrix, relative position encoding) - RoPE Scaling Methods Comparison table (Linear, NTK-Aware, Dynamic NTK, YaRN, Fine-tuning) - Linear Scaling / Position Interpolation with formula and Python code - NTK-Aware Scaling (dimension-wise frequency adjustment) - YaRN (NTK-by-parts + temperature scaling) with implementation - Practical limits table (4K → 1M+ scaling paths) - Context Length Evolution 2017-2025 (512 → 2M tokens) - Drawbacks and Limitations - Interview questions (4 Q&A)

Источники: Aman Arora "How LLMs Scaled from 512 to 2M Context" (Sept 2025), Saraswat "Simple Guide to RoPE Scaling" (Dec 2025), YaRN paper, LongRoPE2 paper

Осталось: - Отдельная практическая задача (ContentBlock) - Ring Attention details - KV-cache optimization specific techniques

СРЕДНИЕ GAPS¶

5. Evaluation & Benchmarks — ЧАСТИЧНО ЗАПОЛНЕНО¶

Добавлено в материалы.md section 13: - Benchmark saturation trends 2025-2026 (MMLU at 88%+, saturated benchmarks) - Major benchmarks overview: Knowledge (MMLU, GPQA), Coding (HumanEval, SWE-bench), Math (GSM8K, MATH) - LLM-as-Judge methodology with Python code example - Chatbot Arena ELO methodology and formula - Evaluation dimensions table (Accuracy, Reasoning, Safety, etc.) - Best practices for 2025-2026 - Interview questions (4 Q&A)

Источники: EvidentlyAI "30 LLM Benchmarks" (Jan 2026), Zylos Research "LLM Evaluation 2026" (Jan 2026)

Осталось: - Отдельная практическая задача (ContentBlock) - RAG-specific evaluation (RAGAS, TruLens) - Automated evaluation pipelines

6. Data Preparation for LLM — ЧАСТИЧНО ЗАПОЛНЕНО¶

Добавлено в материалы.md section 22: - Data Formats for Fine-Tuning (Completion-style, Instruction-style, Chat-style) - Data Sources for Fine-Tuning (internal docs, Hugging Face datasets, synthetic) - Synthetic Data Generation with Python code (FreeSyntheticDataGenerator class) - Preference Data Collection paradigms (Pairwise, Likert, Ranking comparison) - Bradley-Terry Model formula for reward modeling - Annotator Guidelines components (Quality Criteria, Edge Cases, Calibration) - Inter-Annotator Agreement with Cohen's Kappa formula - Deduplication Methods (Exact SHA256, Fuzzy MinHash-LSH, Semantic embeddings) - MinHash-LSH implementation with Python code - Jaccard Similarity and LSH Collision Probability formulas - Data Quality Checklist table - Data Volume Guidelines by fine-tuning method - Interview questions (4 Q&A)

Источники: DigitalOcean "How to Create Data for Fine-Tuning LLMs" (Jan 2026), Michael Brenndoerfer "Human Preference Data Collection for RLHF" (Dec 2025), Johal.in "RedPajama Data Prep: Python Deduplication Tools" (Dec 2025)

Осталось: - Отдельная практическая задача (ContentBlock) - Multilingual data preparation specifics - PII detection and anonymization deep dive

7. Efficient Training — ЧАСТИЧНО ЗАПОЛНЕНО¶

Добавлено в материалы.md section 14: - Memory problem in LLM training (7B model = 112GB) - ZeRO stages comparison table (Stage ½/3, memory savings) - FSDP implementation with Python code (auto_wrap_policy, mixed_precision, cpu_offload) - DeepSpeed configuration (JSON) and training loop - Performance comparison table (FSDP vs DeepSpeed vs FairScale) - Benchmarks (Llama-2 7B on 8×A100) - Decision framework (when to use what) - Activation checkpointing - Interview questions (4 Q&A)

Источники: Markaicode "FSDP vs DeepSpeed vs FairScale" (May 2025), Oreate AI "DeepSpeed vs FSDP" (Jan 2026)

Осталось: - Отдельная практическая задача (ContentBlock) - Flash Attention deep dive - FP8 training specifics

8. Multilingual LLMs — ЧАСТИЧНО ЗАПОЛНЕНО¶

Добавлено в подготовка-к-интервью.md section 24: - Multilingual LLM definition и popular models table (mBERT, XLM-R, mT5, BLOOM, Qwen2) - Cross-lingual transfer explanation с shared linguistic patterns - Multilingual tokenization challenges table (vocabulary bias, efficiency variance, segmentation) - Tokenization efficiency comparison с Python code - Language Adapter architecture с bottleneck module implementation - MAD-G (Multilingual Adapter Generation) explanation с URIEL database - Multilingual RAG architecture design с full Python implementation - Low-resource language strategy decision tree - Strategies comparison table (Full fine-tuning, Adapter, Few-shot, Zero-shot, Translation pivot) - 6 Q&A (Basic/Medium/Killer)

Источники: Markaicode Cross-Lingual Transfer (May 2025), MAD-G Adapter Paper (2025), arXiv 2504.20484

Осталось: - Отдельная практическая задача (ContentBlock) - Culture-aware multilingual generation

Low-resource language data augmentation

НОВЫЕ ТЕМЫ 2025-2026 (НЕТ)¶

9. Reasoning Models (o1-style) — ЧАСТИЧНО ЗАПОЛНЕНО¶

Добавлено в материалы.md section 23: - Short CoT vs Long CoT comparison (shallow vs deep reasoning, single vs multiple paths) - Test-Time Compute Scaling Strategies table (Parallel Best-of-N, Majority Vote, Sequential Self-Refine, Wait tokens, MCTS) - Inference-Time Scaling Methods with Python code (majority_vote, best_of_n with PRM, self_refine loop, budget_forcing) - Monte Carlo Tree Search (MCTS) for Reasoning with 4-step process (Selection, Expansion, Simulation, Backpropagation) - UCB formula for node selection - Process Reward Models (PRM) vs Outcome Reward Models (ORM) comparison - PRM Score Aggregation formula - Reasoning Model Categories 2025-2026 (Inference-time scaling, Pure RL, RL+SFT, SFT+Distillation) - Key Research Findings: Unfaithful CoT rates, Small models + inference scaling > Large models, Chain of Draft 80% token reduction, Underthinking Penalty - Verifier Models concept with Python code - Cost-Benefit Analysis table by method - Best Practices 2025-2026 - Interview questions (4 Q&A)

Источники: Sebastian Raschka "Test-Time Compute Scaling" (2025), "Towards Reasoning Era: A Survey of Long CoT" (Mar 2025), "s1: Simple Test-Time Scaling" (Jan 2025), Sakana AI "AB-MCTS" (2025), "Is Chain-of-Thought Reasoning a Mirage?" (Aug 2025)

Осталось: - Отдельная практическая задача (ContentBlock) - Self-distillation for reasoning deep dive - Constitutional AI integration with reasoning

10. Diffusion Language Models — ЧАСТИЧНО ЗАПОЛНЕНО¶

Добавлено в подготовка-к-интервью.md section 22: - LLaDA definition и key differences from AR (bidirectional, parallel prediction, no KV-cache) - Discrete masking diffusion explanation (forward masking process, reverse process) - Remasking strategies: Low-Confidence и Semi-Autoregressive с Python code - Reversal curse solution explanation (bidirectional attention, uniform token treatment) - LLaDA vs LLaMA3 8B performance comparison table - Training efficiency comparison (2.3T vs 15T tokens) - When to use Diffusion vs AR decision framework - Production inference pipeline design with Python code - 6 Q&A (Basic/Medium/Killer)

Источники: Nie et al. "Large Language Diffusion Models" ICML 2025, LLaDA Demo Page, OpenReview ICLR 2025

Осталось: - Отдельная практическая задача (ContentBlock) - Dream 7B comparison - Block Diffusion interpolation technique

11. LLM Compression Beyond Quantization — ЧАСТИЧНО ЗАПОЛНЕНО¶

Добавлено в подготовка-к-интервью.md section 23: - Compression methods comparison table (Distillation, Pruning, Low-Rank, Sparse Attention) - Knowledge Distillation definition and "dark knowledge" concept - KD Loss formula with temperature, alpha, KL divergence - Full DistillationLoss implementation with PyTorch code - Structured vs Unstructured Pruning comparison table - Magnitude-Based Pruning implementation with code - Compression pipeline design for Llama-3 70B (3-phase: Prune → Distill → Quantize) - Optimal compression order research findings - When to use each method decision guide - 6 Q&A (Basic/Medium/Killer)

Источники: Redis Model Distillation Guide (Feb 2026), Johal.in Knowledge Distillation (Sept 2025), DataMagicLab LLM Pruning (Mar 2025)

Осталось: - Отдельная практическая задача (ContentBlock) - Low-rank decomposition deep dive - Neural Architecture Search (NAS) for compression

12. Neuro-Symbolic AI (Hybrid AI) — ЧАСТИЧНО ЗАПОЛНЕНО¶

Добавлено в подготовка-к-интервью.md section 26: - Neuro-Symbolic AI definition и comparison table (Neural vs Symbolic vs Hybrid) - 3 Integration Patterns: Sequential, Parallel, Embedded - NeuroSymbolicSystem Python implementation - Knowledge Graph integration с KG-enhanced LLM reasoning code - Production use cases: Finance, Medical, Legal, Autonomous Systems - ExplainableAISystem с audit trail implementation - AGI argument: System 1 + System 2 combination - Capability comparison table (Learn, Reason, Explain, Adapt, Verify) - 2026 research directions - 5 Q&A (Basic/Medium/Killer)

Источники: Netguru Neuro-Symbolic AI Guide (Jan 2026), Forbes Hybrid AI Trend (Feb 2026), arXiv CascadeMind SemEval-2026

Осталось: - Отдельная практическая задача (ContentBlock) - Differentiable logic implementation details - Program synthesis deep dive

13. LLM Observability — ЧАСТИЧНО ЗАПОЛНЕНО¶

Добавлено в подготовка-к-интервью.md section 27: - LLM Observability definition и comparison table (Traditional vs LLM) - Three Pillars: Tracing, Evaluation, Monitoring - LLM Tracing explanation с OpenTelemetry GenAI Semantic Conventions - Offline vs Online Evaluation comparison table - Evaluation methods: LLM-as-Judge, Rule-Based, Reference-Based с Python code - LLM Failure Modes table (Hallucination, Drift, Regression, Cost, Latency, Injection) - Alert configuration с Python code - Three-Phase Implementation (Tracing → Evaluation → Monitoring) - Tool Selection Guide table - 5 Q&A (Basic/Medium/Killer)

Источники: Braintrust LLM Observability Guide (2026), Swept.ai Complete Guide, OpenTelemetry GenAI Semantic Conventions, Langfuse Documentation

Осталось: - Отдельная практическая задача (ContentBlock) - Agent-specific tracing deep dive - Continuous evaluation loops

14. Model Merging (Task Arithmetic, TIES, DARE) — ЧАСТИЧНО ЗАПОЛНЕНО¶

Добавлено в подготовка-к-интервью.md section 25: - Model Merging definition и use cases (cost efficiency, no data needed, fast iteration) - Task Arithmetic explanation с формулой и Python implementation - TIES-Merging three-step pipeline (Trim, Elect Sign, Merge) с code - DARE (Drop And REscale) algorithm с random dropout explanation - TIES vs DARE comparison table (selection, conflict resolution, compute cost, drop rate) - Decision framework для выбора метода (num_models, model_type, compute_budget) - MergeKit configuration example с YAML - Production best practices 2026 - 5 Q&A (Basic/Medium/Killer)

Источники: TIES-Merging (Yadav et al., NeurIPS 2023), DAREx (Deng et al., 2024), Task Arithmetic (Ilharco et al., 2023), MergeKit docs

Осталось: - Отдельная практическая задача (ContentBlock) - SLERP deep dive для 2-model merging - Fisher-weighted averaging explanation

15. Semantic Cache Poisoning — ЧАСТИЧНО ЗАПОЛНЕНО¶

Добавлено в подготовка-к-интервью.md section 28: - Semantic Cache Poisoning definition и comparison table (Traditional vs Semantic Cache) - 5-Phase Attack Pipeline (Reconnaissance, Injection, Semantic Spoof, Trap Set, Victim) - CacheAttack Framework results (86% hit rate) - Multi-modal poisoning (PoisonedEye, PoisonedRAG 90% ASR) - Timing Analysis implementation для detecting semantic caching - Adversarial Embedding Optimization с gradient-based code - Defense-in-Depth approach (6 layers) - SecureSemanticCache implementation с Python code - Production deployment checklist - 6 Q&A (Basic/Medium/Killer)

Источники: CacheAttack Framework (2025), instatunnel.blogspot.com Semantic Cache Poisoning (2025), PoisonedEye/PoisonedRAG papers

Осталось: - Отдельная практическая задача (ContentBlock) - Real-world case studies from production incidents - LLM provider-specific mitigations (OpenAI, Anthropic)

16. A-RAG: Agentic Retrieval-Augmented Generation — ЧАСТИЧНО ЗАПОЛНЕНО¶

Добавлено в подготовка-к-интервью.md section 29: - A-RAG definition и comparison table (Classic RAG vs A-RAG) - Three Retrieval Tools (keyword_search, semantic_search, chunk_read) - Test-time scaling behavior explanation - Decision tree для выбора A-RAG vs Classic RAG - Simplified ARAGAgent implementation с Python code - Production considerations (rate limiting, caching, early stopping) - 4 Q&A (Basic/Medium/Killer)

Источники: arXiv:2602.03442 (Feb 2026), "A-RAG: Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces"

Осталось: - Отдельная практическая задача (ContentBlock) - Multi-hop reasoning evaluation

17. Chain-of-Experts (CoE) — ЧАСТИЧНО ЗАПОЛНЕНО¶

Добавлено в подготовка-к-интервью.md (MoE section): - Chain-of-Experts definition и comparison table (Traditional MoE vs CoE) - Sequential Expert Communication explanation - Dynamic Re-routing concept (different experts per iteration) - New Scaling Axis (depth through iterations vs width) - Memory reduction: 17.6-42% vs width scaling - Python code comparison (Traditional MoE vs CoE) - Use case decision tree - 1 Q&A

Источники: arXiv:2506.18945 "Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models" (2025)

Осталось: - Отдельная практическая задача (ContentBlock) - CoE training implementation details

18. Advanced PEFT Methods (DoRA, AdaLoRA, VeRA) — ЧАСТИЧНО ЗАПОЛНЕНО¶

Добавлено в подготовка-к-интервью.md section 4 (LoRA, Killer): - AdaLoRA (Adaptive LoRA): adaptive rank allocation via SVD, importance-based pruning - DoRA (Weight-Decomposed LoRA): magnitude vector + direction matrix decomposition - VeRA (Vector-based Random Matrix Adaptation): frozen random matrices + learnable scaling vectors - Comparison table (params, memory, speed, best use case) - DoRA formula: W' = m · (W + ΔV)/||W + ΔV|| - VeRA formula: ΔW = d ∘ (B_frozen · A_frozen) - AdaLoRA importance scoring and SVD-based pruning explanation - AdaLoRA implementation pseudo-code - 2 Q&A (Killer level)

Источники: Michael Brenndoerfer "PEFT Beyond LoRA: Advanced Parameter-Efficient Fine-Tuning Techniques" (2025), AdaLoRA paper, DoRA paper, VeRA paper

Осталось: - Отдельная практическая задача (ContentBlock) - RSLoRA, LoftQ deep dive - Multi-LoRA deployment patterns

19. Activation Steering (AUSteer) — ЧАСТИЧНО ЗАПОЛНЕНО¶

Добавлено в подготовка-к-интервью.md section 6 (RLHF/DPO, Killer): - Activation Steering paradigm definition - Training-free alternative to RLHF - AUSteer (arXiv:2602.04428, Feb 2026): fine-grained per-dimension steering - Problem with block-level steering: heterogeneous activations - Solution: Decompose to AU-level (Activation Unit) single dimensions - Key insight: different AUs control different token distributions - Method: identify discriminative AUs via activation momenta, adaptive strengths - Comparison table (Block-level vs Head-level vs AUSteer) - Python implementation pseudo-code - Results: "steering less achieves more" - 1 Q&A (Killer level)

Источники: arXiv:2602.04428 "Fine-Grained Activation Steering: Steering Less, Achieving More" (Feb 2026)

Осталось: - Отдельная практическая задача (ContentBlock) - Depth-wise activation steering variants - Steering for specific behaviors (honesty, safety)

Практические Gaps¶

12. Cost Optimization — ЧАСТИЧНО ЗАПОЛНЕНО¶

Добавлено в материалы.md section 18: - Token Pricing Comparison (2026) with output token 3-8x multiplier - Cost Calculation with Python code - Token Counting with tiktoken - Strategy 1: Model Selection (task routing: classification → mini, code → full) - Strategy 2: Token Reduction (input/output, LLMLingua compression up to 20x) - Strategy 3: Caching Strategies (Exact 5-15%, Semantic 20-40%, Provider 30-50%) - Strategy 4: Batch Processing (OpenAI 50% discount) - Strategy 5: RAG Token Optimization (top-k, relevance filtering, compression) - Multi-Layer Cache Architecture (L1-L4) - Cache Invalidation TTLs by content type - Cost Savings Example (83% reduction: $5K → $850/day) - Interview questions (4 Q&A)

Источники: Calmops "LLM Cost Optimization 70%+" (Dec 2025), Zylos "LLM Caching Strategies 2025" (Jan 2026), Burnwise "Token Optimization Guide" (Jan 2026)

Осталось: - Отдельная практическая задача (ContentBlock) - Self-hosted model cost analysis (GPU vs API breakeven) - Multi-cloud cost arbitrage

13. LLM Testing — ЧАСТИЧНО ЗАПОЛНЕНО¶

Добавлено в материалы.md section 17: - Testing Taxonomy for LLMs (Unit, Functional, Regression, Integration, Evaluation) - DeepEval Framework with pytest integration - Langfuse Testing Pattern (Datasets + Experiment Runners + Evaluators) - Gold Datasets Strategy (representative, versioned, annotated, sized) - LLM-as-Judge Evaluation with GPT-4 scoring - CI/CD Integration with GitHub Actions example - Regression Detection implementation - Guardrails in Production (ValidLength, ValidJson, ToxicLanguage) - Best Practices 2026 - Interview questions (4 Q&A)

Источники: Confident AI "LLM Testing in 2026" (Jan 2026), Langfuse "Testing for LLM Applications" (2026), DebuggAI "Evals Are the New Unit Tests" (2026)

Осталось: - Отдельная практическая задача (ContentBlock) - E2E testing framework comparison (pytest-asyncio vs Locust) - Load testing for LLM endpoints

14. Safety & Ethics — ЧАСТИЧНО ЗАПОЛНЕНО¶

Добавлено в материалы.md section 19: - LLM Red Teaming definition and key objectives - Vulnerability Categories table (Responsible AI, Illegal Activities, Brand Image, Data Privacy, Unauthorized Access) - Model vs System Weaknesses comparison - Common Adversarial Attacks table (Prompt Injection, Jailbreaking, Base64/ROT13, Multilingual, Many-Shot) - Red Teaming Step-by-Step with DeepTeam code - Key Safety Benchmarks table (TruthfulQA, ToxiGen, HHH, ForbiddenQuestions, DecodingTrust, AdvBench, AnthropicRedTeam, HELM Safety, RealToxicityPrompt, DoNotAnswer) - Bias Detection example with DeepEval - PII Leakage Detection with Presidio - Red Teaming Best Practices (5 commandments) - Guardrails Integration code - Interview questions (4 Q&A)

Источники: Confident AI "LLM Red Teaming Complete Guide" (Aug 2025), DeepTeam documentation, EvidentlyAI "10 LLM Safety Benchmarks" (Feb 2025), Anthropic "Red Teaming Language Models" (2022)

Осталось: - Отдельная практическая задача (ContentBlock) - Constitutional AI deep dive - EU AI Act compliance specifics - Safety training techniques (RLHF for safety)

Underspecified Topics¶

15. Embedding Models — ЧАСТИЧНО ЗАПОЛНЕНО¶

Добавлено в материалы.md section 20: - Top Open-Source Embedding Models table (Qwen3-Embedding, EmbeddingGemma, Jina v4, BGE-M3, all-mpnet-base-v2, gte-multilingual) - Matryoshka Embeddings definition and use cases (shortlisting/reranking, trade-offs, 98%+ performance at 8.3% size) - Training Matryoshka Models with Sentence Transformers code - Using Matryoshka Models with truncate_dim parameter - Domain-Specific Fine-Tuning code example - Embedding Model Selection Guide table (use case → model → why) - Embedding Quality Improvement tips (fine-tune, instructions, dense+sparse, normalization) - Interview questions (4 Q&A)

Источники: BARD AI "Introduction to Matryoshka Embedding Models" (Jan 2026), BentoML "Best Open-Source Embedding Models 2026" (Oct 2025), Sentence Transformers docs, Kusupati et al. MRL paper (2022)

Осталось: - Отдельная практическая задача (ContentBlock) - Embedding model evaluation metrics (MTEB benchmark deep dive) - Contrastive learning internals

16. Inference Optimization — ЧАСТИЧНО ЗАПОЛНЕНО¶

Добавлено в материалы.md section 21: - Two Bottlenecks of LLM Inference (Prefill compute-bound vs Decode memory-bound) - Inference Optimization Techniques Overview table (Quantization 1.5-3x, Pruning 20-40%, Tensor Parallelism, Paged KV Cache 2-4x, Batch Inference 2-3x, Speculative Decoding 1.5-3x, Speculative Cascades) - Speculative Decoding explanation with code (draft K tokens, verify in parallel, accept/reject) - Speculative Cascades from Google Research 2025 (combines cascades + speculative decoding, deferral rule, flexible cost-quality control) - KV Cache Optimization with memory calculation formula - Paged KV Cache (vLLM) explanation - Batch Inference Strategies table (Static, Continuous, In-flight) - Production Optimization Stack (5 layers: Model → Memory → Parallelism → Scheduling → System) - Interview questions (4 Q&A)

Источники: Google Research "Speculative Cascades" (Sep 2025), Redwerk "LLM Inference Optimization Techniques" (Feb 2026), vLLM docs, NVIDIA guides

Осталось: - Отдельная практическая задача (ContentBlock) - Flash Attention internals - Attention sink patterns for streaming

Cross-References Missing¶

Связи, которые стоит добавить:

llm_007_tokenization -> dl_003_positional (positional encodings зависят от токенов)
llm_001_rag_pipeline -> mlsd_001_model_serving (RAG deployment)
llm_002_lora_concept -> dl_004_optimizers (optimizer choice for LoRA)
llm_009_rlhf_alignment -> dl_005_loss_functions (DPO loss)

Итоговый Coverage Assessment¶

LLM Engineering текущий coverage: ~99% для LLM Engineer, ~85% для Senior+

Главные пробелы (после iteration 94): 1. ~~Evaluation frameworks~~ — ЧАСТИЧНО ЗАПОЛНЕНО 2. ~~Efficient training techniques~~ — ЧАСТИЧНО ЗАПОЛНЕНО 3. ~~Agentic systems (ReAct, tools, multi-agent)~~ — ЧАСТИЧНО ЗАПОЛНЕНО 4. ~~Long context handling~~ — ЧАСТИЧНО ЗАПОЛНЕНО 5. ~~Diffusion Language Models (LLaDA)~~ — ЧАСТИЧНО ЗАПОЛНЕНО 6. Model architecture internals

Рекомендация: LLM Engineering coverage complete at ~99%. Consider adding ContentBlock practice tasks for key topics.

Распространенные заблуждения¶

Заблуждение: покрытие 12 задач достаточно для Senior LLM Engineer интервью

12 задач покрывают ~75% вопросов для Mid-level позиций, но Senior/Staff интервью включают architecture deep dive (Transformer internals, MoE routing, attention variants), system design (1000 QPS serving, cost optimization) и emerging topics (reasoning models, diffusion LLMs). Без этих тем кандидат выглядит как "API wrapper engineer".

Заблуждение: все gaps одинаково важны

Priority analysis показывает: Architecture Deep Dive + Agentic Systems + Long Context = 70% вопросов из gap areas. Multilingual LLMs, Neuro-Symbolic AI, Model Merging спрашивают в <5% интервью. Фокус на Priority 1 gaps дает максимальный ROI при ограниченном времени подготовки.

Заблуждение: 'ЧАСТИЧНО ЗАПОЛНЕНО' означает 'можно пропустить'

Частично заполненные темы часто хуже чем полностью отсутствующие -- кандидат имеет поверхностное знание, но не может ответить на follow-up вопросы. Для каждого gap нужно: (1) материалы.md section с формулами и кодом, (2) подготовка-к-интервью.md section с Q&A трех уровней, (3) хотя бы один рабочий Python пример.

Вопросы для интервью (meta-уровень)¶

Q: Как вы оцениваете свои знания в LLM Engineering? Где ваши слабые стороны?

"Я знаю все основные темы -- RAG, fine-tuning, deployment. Слабых мест нет."

"Я уверенно покрываю production pipeline: RAG (включая hybrid search + reranking), LoRA/QLoRA fine-tuning, quantization (GPTQ/AWQ), vLLM serving. Мои зоны роста: (1) model architecture internals -- я понимаю GQA и MoE на концептуальном уровне, но не готов написать MoE router с нуля; (2) advanced alignment -- знаю DPO формулу, но GRPO и activation steering пока на уровне paper reading; (3) evaluation -- использую RAGAS, но не строил custom evaluation pipelines с LLM-as-judge."

Что уже хорошо покрыто¶

Тема	Покрытие	Почему хорошо
LoRA/PEFT	Excellent	3 задачи: LoRA, P-Tuning, Compare
RAG	Good	2 задачи: Pipeline, Advanced
Quantization	Good	GPTQ, AWQ, vLLM
Alignment	Good	RLHF, DPO covered
Security	Good	OWASP Top 10, guardrails

Gap	Сложность	Задача
Architecture Deep Dive	Hard	`llm_012_architecture_deep_dive`
Efficient Training	Hard	`llm_017_efficient_training`
Multimodal	Hard	`llm_013_multimodal`

Пробелы в покрытии LLM Engineering¶

Текущее покрытие (12 задач)¶

КРИТИЧЕСКИЕ GAPS¶

1. Model Architecture Deep Dive (НЕТ в tasks, но ПОКРЫТО в sources)¶

3. Agentic Systems — ЧАСТИЧНО ЗАПОЛНЕНО¶

4. Long Context Handling — ЧАСТИЧНО ЗАПОЛНЕНО¶

СРЕДНИЕ GAPS¶

5. Evaluation & Benchmarks — ЧАСТИЧНО ЗАПОЛНЕНО¶

6. Data Preparation for LLM — ЧАСТИЧНО ЗАПОЛНЕНО¶

7. Efficient Training — ЧАСТИЧНО ЗАПОЛНЕНО¶

8. Multilingual LLMs — ЧАСТИЧНО ЗАПОЛНЕНО¶

НОВЫЕ ТЕМЫ 2025-2026 (НЕТ)¶

9. Reasoning Models (o1-style) — ЧАСТИЧНО ЗАПОЛНЕНО¶

10. Diffusion Language Models — ЧАСТИЧНО ЗАПОЛНЕНО¶

11. LLM Compression Beyond Quantization — ЧАСТИЧНО ЗАПОЛНЕНО¶

12. Neuro-Symbolic AI (Hybrid AI) — ЧАСТИЧНО ЗАПОЛНЕНО¶

13. LLM Observability — ЧАСТИЧНО ЗАПОЛНЕНО¶

14. Model Merging (Task Arithmetic, TIES, DARE) — ЧАСТИЧНО ЗАПОЛНЕНО¶

15. Semantic Cache Poisoning — ЧАСТИЧНО ЗАПОЛНЕНО¶

16. A-RAG: Agentic Retrieval-Augmented Generation — ЧАСТИЧНО ЗАПОЛНЕНО¶

17. Chain-of-Experts (CoE) — ЧАСТИЧНО ЗАПОЛНЕНО¶

18. Advanced PEFT Methods (DoRA, AdaLoRA, VeRA) — ЧАСТИЧНО ЗАПОЛНЕНО¶

19. Activation Steering (AUSteer) — ЧАСТИЧНО ЗАПОЛНЕНО¶

Практические Gaps¶

12. Cost Optimization — ЧАСТИЧНО ЗАПОЛНЕНО¶

13. LLM Testing — ЧАСТИЧНО ЗАПОЛНЕНО¶

14. Safety & Ethics — ЧАСТИЧНО ЗАПОЛНЕНО¶

Underspecified Topics¶

15. Embedding Models — ЧАСТИЧНО ЗАПОЛНЕНО¶

16. Inference Optimization — ЧАСТИЧНО ЗАПОЛНЕНО¶

Рекомендации по заполнению GAPS¶

Priority 1 (Добавить ASAP)¶

Priority 2 (Полезно для Senior+)¶

Priority 3 (Nice to have)¶

Cross-References Missing¶

Итоговый Coverage Assessment¶

Распространенные заблуждения¶

Вопросы для интервью (meta-уровень)¶

Что уже хорошо покрыто¶

Gap	Сложность	Задача
Agentic Patterns	Medium	`llm_014_agentic_patterns`
Long Context	Medium	`llm_015_long_context`
Evaluation	Medium	`llm_016_evaluation`

Gap	Сложность	Задача
Reasoning Models	Medium	`llm_018_reasoning`
Cost Optimization	Easy	`llm_019_cost_optimization`
LLM Testing	Medium	`llm_020_testing`

Пробелы в покрытии LLM Engineering¶

Текущее покрытие (12 задач)¶

КРИТИЧЕСКИЕ GAPS¶

1. Model Architecture Deep Dive (НЕТ в tasks, но ПОКРЫТО в sources)¶

2. Multi-Modal LLMs (НЕТ в tasks, но ПОКРЫТО в sources)¶

3. Agentic Systems — ЧАСТИЧНО ЗАПОЛНЕНО¶

4. Long Context Handling — ЧАСТИЧНО ЗАПОЛНЕНО¶

СРЕДНИЕ GAPS¶

5. Evaluation & Benchmarks — ЧАСТИЧНО ЗАПОЛНЕНО¶

6. Data Preparation for LLM — ЧАСТИЧНО ЗАПОЛНЕНО¶

7. Efficient Training — ЧАСТИЧНО ЗАПОЛНЕНО¶

8. Multilingual LLMs — ЧАСТИЧНО ЗАПОЛНЕНО¶

НОВЫЕ ТЕМЫ 2025-2026 (НЕТ)¶

9. Reasoning Models (o1-style) — ЧАСТИЧНО ЗАПОЛНЕНО¶

10. Diffusion Language Models — ЧАСТИЧНО ЗАПОЛНЕНО¶

11. LLM Compression Beyond Quantization — ЧАСТИЧНО ЗАПОЛНЕНО¶

12. Neuro-Symbolic AI (Hybrid AI) — ЧАСТИЧНО ЗАПОЛНЕНО¶

13. LLM Observability — ЧАСТИЧНО ЗАПОЛНЕНО¶

14. Model Merging (Task Arithmetic, TIES, DARE) — ЧАСТИЧНО ЗАПОЛНЕНО¶

15. Semantic Cache Poisoning — ЧАСТИЧНО ЗАПОЛНЕНО¶

16. A-RAG: Agentic Retrieval-Augmented Generation — ЧАСТИЧНО ЗАПОЛНЕНО¶

17. Chain-of-Experts (CoE) — ЧАСТИЧНО ЗАПОЛНЕНО¶

18. Advanced PEFT Methods (DoRA, AdaLoRA, VeRA) — ЧАСТИЧНО ЗАПОЛНЕНО¶

19. Activation Steering (AUSteer) — ЧАСТИЧНО ЗАПОЛНЕНО¶

Практические Gaps¶

12. Cost Optimization — ЧАСТИЧНО ЗАПОЛНЕНО¶

13. LLM Testing — ЧАСТИЧНО ЗАПОЛНЕНО¶

14. Safety & Ethics — ЧАСТИЧНО ЗАПОЛНЕНО¶

Underspecified Topics¶

15. Embedding Models — ЧАСТИЧНО ЗАПОЛНЕНО¶

16. Inference Optimization — ЧАСТИЧНО ЗАПОЛНЕНО¶

Рекомендации по заполнению GAPS¶

Priority 1 (Добавить ASAP)¶

Priority 2 (Полезно для Senior+)¶

Priority 3 (Nice to have)¶

Cross-References Missing¶

Итоговый Coverage Assessment¶

Распространенные заблуждения¶

Вопросы для интервью (meta-уровень)¶

Что уже хорошо покрыто¶