Перейти к содержанию

ML Practice: Gap Analysis (Consolidated)

~3 минуты чтения

Полный анализ пробелов в 93 задачах Приоритизированный список новых задач для добавления Обновлено: 2026-02-11


Executive Summary

Current Coverage: 93 tasks across 8 categories Identified Gaps: 40+ topics missing for complete AI/ML/LLM Engineer preparation Priority 1 Gaps: 15 topics (should add ASAP) Priority 2 Gaps: 12 topics (useful for Senior+)


Gap Categories

Category A: Missing ML Domains (High Impact)

Gap Frequency Level Justification
Computer Vision (ViT, YOLO) HIGH Mid/Sr Only CNN covered
Time Series (ARIMA, Prophet) MEDIUM Mid Zero coverage
Graph Neural Networks MEDIUM Sr Growing demand
Generative Models (VAE, GAN, Diffusion) HIGH Mid/Sr Only mentioned
Reinforcement Learning MEDIUM Sr Zero coverage
NLP Classic (NER, POS, Word2Vec) LOW Mid LLM-focused only

Category B: Missing Production Topics (Critical)

Gap Frequency Level Justification
Feature Stores HIGH Sr Essential infrastructure
ML CI/CD HIGH Mid/Sr Production standard
Cost Optimization MEDIUM Sr Business critical
Experiment Tracking HIGH Mid Team collaboration
Data Quality MEDIUM Mid Production reliability

Category C: Missing LLM Topics (High Impact)

Gap Frequency Level Justification
Model Architecture (MoE, GQA) HIGH Sr Interview standard
Long Context Handling HIGH Sr 2025-2026 trend
Agentic Systems (deep) HIGH Sr Growing demand
Evaluation Frameworks MEDIUM Mid Production need
Efficient Training MEDIUM Sr Cost critical
Multimodal LLMs MEDIUM Sr Future direction

Category D: Missing Math Topics (Foundation)

Gap Frequency Level Justification
VC Dimension, PAC Learning LOW Sr Theory depth
Matrix Calculus MEDIUM Mid/Sr Backprop understanding
Sampling Methods (MCMC) LOW Sr Bayesian ML
Scaling Laws MEDIUM Sr LLM understanding

Priority 1: Add These 15 Tasks

ML Math (3 tasks)

math_042_matrix_calculus
  Focus: Jacobian, Hessian, gradient of matrix operations
  Difficulty: Hard

math_043_pac_learning
  Focus: VC dimension, generalization bounds
  Difficulty: Hard

math_044_scaling_laws
  Focus: Chinchilla scaling, compute-optimal training
  Difficulty: Medium

Deep Learning (3 tasks)

dl_012_vision_transformer
  Focus: ViT architecture, patch embedding, positional encoding
  Difficulty: Medium

dl_013_diffusion_basics
  Focus: Forward/reverse process, DDPM
  Difficulty: Hard

dl_014_gan_basics
  Focus: Generator/discriminator, training dynamics
  Difficulty: Medium

LLM Engineering (4 tasks)

llm_013_agentic_patterns
  Focus: ReAct deep dive, tool use, planning
  Difficulty: Medium

llm_014_long_context
  Focus: RoPE scaling, KV-cache optimization, Ring Attention
  Difficulty: Hard

llm_015_moe_architecture
  Focus: Mixture of Experts, routing, sparse activation
  Difficulty: Hard

llm_016_llm_evaluation
  Focus: MMLU, LLM-as-judge, Arena evaluation
  Difficulty: Medium

ML System Design (3 tasks)

mlsd_009_feature_stores
  Focus: Feast, Tecton, online/offline features
  Difficulty: Medium

mlsd_010_ml_cicd
  Focus: Automated testing, deployment, rollback
  Difficulty: Medium

mlsd_011_cost_optimization
  Focus: GPU utilization, spot instances, right-sizing
  Difficulty: Medium

AI Agents (2 tasks)

agents_003_tool_use_safety
  Focus: Permission boundaries, validation, human-in-loop
  Difficulty: Medium

agents_004_agent_memory
  Focus: Short-term, long-term, episodic memory systems
  Difficulty: Medium

Priority 2: Add These 12 Tasks

Classical ML (2 tasks)

ml_017_multi_label_classification
  Focus: Binary relevance, classifier chains
  Difficulty: Medium

ml_018_interpretability
  Focus: SHAP, LIME, feature importance
  Difficulty: Medium

Data Engineering (2 tasks)

de_003_feature_store_basics
  Focus: Feature consistency, versioning
  Difficulty: Medium

de_004_data_quality
  Focus: Great Expectations, validation
  Difficulty: Medium

ML System Design (4 tasks)

mlsd_012_online_learning
  Focus: Streaming ML, incremental updates
  Difficulty: Hard

mlsd_013_multi_armed_bandits
  Focus: epsilon-greedy, UCB, Thompson Sampling
  Difficulty: Medium

mlsd_014_causal_inference
  Focus: Propensity score, uplift modeling
  Difficulty: Hard

mlsd_015_vector_databases
  Focus: ANN indexes, Pinecone/Milvus
  Difficulty: Medium

LLM Engineering (2 tasks)

llm_017_multimodal
  Focus: Vision-language models, image tokenization
  Difficulty: Hard

llm_018_reasoning_models
  Focus: o1-style reasoning, test-time compute
  Difficulty: Hard

AI Agents (2 tasks)

agents_005_human_loop
  Focus: Human feedback, approval workflows
  Difficulty: Medium

agents_006_multi_agent_orchestration
  Focus: Hierarchical, parallel, collaborative patterns
  Difficulty: Hard

Priority 3: Nice to Have (13 tasks)

# Time Series
ts_001_arima_basics
ts_002_prophet_basics
ts_003_deepar_basics

# Graph ML
graph_001_gnn_basics
graph_002_knowledge_graphs

# Reinforcement Learning
rl_001_q_learning
rl_002_policy_gradient

# NLP Classic
nlp_001_word2vec
nlp_002_ner_pos

# Advanced Optimization
opt_001_lbfgs
opt_002_natural_gradient

# Bayesian ML
bayes_001_bnn_basics
bayes_002_uncertainty

Gap Impact Matrix

Gap Category Current Coverage After Priority 1 After Priority 2
ML Math 90% 95% 95%
Classical ML 85% 85% 95%
Deep Learning 80% 95% 95%
LLM Engineering 65% 90% 98%
ML System Design 70% 90% 98%
Data Engineering 75% 75% 95%
MLOps 60% 60% 60%
AI Agents 70% 90% 98%

Implementation Roadmap

Phase 1 (Week 1-2): Priority 1 Tasks

  • Create 15 new ContentBlocks
  • Focus on most impactful gaps
  • Add cross-references to existing tasks

Phase 2 (Week 3-4): Priority 2 Tasks

  • Create 12 additional ContentBlocks
  • Fill in production-focused gaps
  • Add system design case studies

Phase 3 (Ongoing): Priority 3 Tasks

  • Add based on user demand
  • Track interview frequency changes
  • Update based on 2025-2026 trends

Gap Detection Sources

  1. Interview Reports - 15+ company reports analyzed
  2. Job Postings - Indeed, LinkedIn, levels.fyi
  3. Reddit/Telegram - r/MachineLearning, ODS.ai
  4. Courses - Stanford CS, DeepLearning.AI
  5. Papers - ArXiv 2024-2026

Metrics for Success

Metric Current Target
Task Count 93 120+
Category Coverage 85% 95%
Senior+ Coverage 60% 85%
Production Coverage 65% 90%

This gap analysis should be reviewed quarterly to track interview trends.