ML Math: Пробелы (Gaps)¶
~7 минут чтения
Что спрашивают на собеседованиях, чего НЕТ в 93 задачах Недопокрытые темы для AI/ML/LLM Engineer Обновлено: 2026-02-11
Текущее покрытие (41 задача)¶
| Подкатегория | Задач | Покрытие |
|---|---|---|
| Calculus | 3 | Хорошее |
| Linear Algebra | 4 | Хорошее |
| NumPy/Pandas | 3 | Базовое |
| Probability | 3 | Хорошее |
| Statistics | 17 | Отличное |
| Information Theory | 4 | Хорошее |
| Classification Metrics | 2 | Хорошее |
| Regularization | 3 | Хорошее |
| Ensemble | 2 | Хорошее |
КРИТИЧЕСКИЕ GAPS¶
1. Advanced Optimization Theory — ЧАСТИЧНО ЗАПОЛНЕНО¶
Добавлено в materials.md section 24: - Newton's method update rule and quadratic convergence - Why Newton doesn't work for deep learning (3 levels of problems) - BFGS: Quasi-Newton with \(O(n^2)\) memory - L-BFGS: Limited-memory version with \(O(n)\) memory - Comparison table (Newton/BFGS/L-BFGS/SGD/Adam) - Python implementation (scipy L-BFGS-B) - Interview questions (5 Q&A)
Источники: Dang Truong Blog (2025), GeeksforGeeks, NumberAnalytics, EPFL OptML course
Осталось: - Convergence proofs for SGD/Adam (advanced) - Learning rate theory (why \(1/\sqrt{t}\) decay) - Отдельная задача (ContentBlock)
2. Measure Theory Basics (НЕТ)¶
Что спрашивают на Senior+ позициях: - Probability as measure - Radon-Nikodym derivative - KL divergence = measure difference
Приоритет: LOW (только research positions)
3. VC Dimension & Learning Theory — ЧАСТИЧНО ЗАПОЛНЕНО¶
Добавлено в materials.md: - PAC learning framework - VC dimension definition and bounds - Generalization bound formula - Trade-off diagram - Interview questions
Источники: Cole Mei blog 2025, Understanding ML book
Осталось: - Rademacher complexity (advanced) - Отдельная задача (ContentBlock)
4. Matrix Calculus — ЧАСТИЧНО ЗАПОЛНЕНО¶
Добавлено в materials.md section 15: - Jacobian matrix definition and examples - Hessian matrix and critical point classification - Comparison table (Gradient vs Jacobian vs Hessian) - Trace tricks for matrix derivatives - Python implementation (numerical_jacobian, numerical_hessian) - Applications in Neural Networks - Interview questions (Newton's method, conditioning, VJP)
Источники: DataCamp Hessian Tutorial, GeeksforGeeks
Осталось: - Hessian-vector products (efficient computation) - Advanced trace tricks - Отдельная задача (ContentBlock)
5. Sampling Methods — ЧАСТИЧНО ЗАПОЛНЕНО¶
Добавлено в materials.md section 13: - Metropolis-Hastings algorithm with code - Gibbs Sampling, HMC, Slice Sampling comparison - Convergence diagnostics (Trace plots, ACF, ESS, Gelman-Rubin) - Acceptance rate guidelines - Interview questions
Источники: NumberAnalytics MCMC guide, MCMC Handbook
Осталось: - Rejection/Importance sampling details - Langevin dynamics - Отдельная задача (ContentBlock)
СРЕДНИЕ GAPS¶
6. Information Geometry — ЧАСТИЧНО ЗАПОЛНЕНО¶
Добавлено в materials.md section 22: - Fisher Information Matrix definition and properties - Natural gradient formula and intuition - Approximations (diagonal, K-FAC, low-rank) - Adam as approximate natural gradient - Python implementation (empirical Fisher) - Interview questions (5 Q&A)
Источники: Ji-Ha Kim Blog (2025), Satyam Mishra Substack (2025), AI Under the Hood
Осталось: - Dually flat spaces (advanced) - Amari's alpha-connections - Отдельная задача (ContentBlock)
7. Constrained Optimization — ЧАСТИЧНО ЗАПОЛНЕНО¶
Добавлено в materials.md section 18: - General optimization problem formulation - Lagrange multipliers (equality constraints) - KKT conditions table (4 conditions) - Complementary slackness intuition - SVM dual formulation derivation - Python implementation (scipy.optimize) - Convexity matters table - Interview questions (5 Q&A)
Источники: NumberAnalytics KKT Guide, CircuitLabs Convex Optimization (2025), Boyd book
Осталось: - Duality theory (weak vs strong duality) - Advanced SVM formulations (soft margin, kernel trick) - Отдельная задача (ContentBlock)
8. Numerical Stability — ЧАСТИЧНО ЗАПОЛНЕНО¶
Добавлено в materials.md section 16: - Log-sum-exp trick derivation and code - Softmax numerical stability (stable vs unstable) - Gradient clipping methods comparison table - Gradient clipping PyTorch code (3 methods) - Common numerical issues table - FP16/BF16 mixed precision considerations - Interview questions (6 Q&A)
Источники: Best AI Tools Softmax Guide, GeeksforGeeks Gradient Clipping
Осталось: - Advanced FP16/BF16 techniques (loss scaling) - Отдельная задача (ContentBlock)
9. Random Projections — ЧАСТИЧНО ЗАПОЛНЕНО¶
Добавлено в materials.md section 23: - Johnson-Lindenstrauss lemma statement - Target dimension formula \(k \geq 8\ln n / \varepsilon^2\) - Random projection methods (Gaussian, Rademacher, Sparse/Achlioptas) - Python implementation (from scratch + sklearn) - JL vs PCA comparison table - Interview questions (5 Q&A)
Источники: AI Under the Hood, NumberAnalytics, Achlioptas (2003)
Осталось: - Subspace embeddings (advanced) - Feature hashing (hashing trick) - Отдельная задача (ContentBlock)
10. Hypothesis Testing Advanced — ЧАСТИЧНО ЗАПОЛНЕНО¶
Добавлено в materials.md section 19: - Multiple testing problem (probability table) - FWER vs FDR comparison - Bonferroni correction formula - Benjamini-Hochberg algorithm with example - Comparison table (Bonferroni/Holm/Dunnett/BH) - Python implementation (from scratch + statsmodels) - Decision guide flowchart - Interview questions (5 Q&A)
Источники: Statsig Blog (2025), GeeksforGeeks BH Procedure
Осталось: - Permutation tests - Sequential testing (alpha-spending) - Bayesian A/B testing - Отдельная задача (ContentBlock)
НОВЫЕ ТЕМЫ 2025-2026¶
11. Test-Time Compute — ЧАСТИЧНО ЗАПОЛНЕНО¶
Тренд 2025: Spending more compute at inference for better outputs.
Добавлено в materials.md section 21: - Test-time vs train-time compute comparison - Methods: CoT, Best-of-N, majority voting, "wait" tokens - Budget forcing and process reward models - Python code examples (budget_forcing, best_of_n_with_verifier) - Interview questions (5 Q&A)
Источники: Sebastian Raschka Blog (2025), arXiv scaling papers
Осталось: - Compute-optimal inference theory - Verification scaling formal models - Отдельная задача (ContentBlock)
12. Scaling Laws — ЧАСТИЧНО ЗАПОЛНЕНО¶
Добавлено в materials.md: - Kaplan (2020) first scaling laws - Chinchilla (2022) compute-optimal formula - Beyond Chinchilla (2025) inference costs - Densing Law, Sloth (2025) - Interview questions and code examples
Источники: AIMultiple Feb 2026, arXiv papers
Осталось: - Отдельная задача (ContentBlock) с расчётами
13. Spectral Methods — ЧАСТИЧНО ЗАПОЛНЕНО¶
Добавлено в materials.md section 20: - Spectral clustering overview and use cases - Graph construction methods (ε-neighborhood, k-NN, fully connected) - Graph Laplacian variants table (L, L_sym, L_rw) - Spectral clustering algorithm (5 steps) - Python implementation with make_moons comparison - Eigenvalue interpretation - K-Means vs Spectral comparison table - Interview questions (5 Q&A)
Источники: GeeksforGeeks Spectral Clustering (2025), Von Luxburg Tutorial
Осталось: - Advanced graph signal processing - Spectral graph neural networks - Отдельная задача (ContentBlock)
Практические Gaps¶
14. Time Series Statistics — ЧАСТИЧНО ЗАПОЛНЕНО¶
Добавлено в materials.md section 14: - Stationarity definition and tests (ADF, KPSS) - ARIMA(p,d,q) model equations - ACF/PACF interpretation for model selection - Cointegration concept - Common pitfalls - Interview questions
Источники: BagelQuant Quant Interview FAQ 2025, Hyndman FPP3
Осталось: - GARCH models (volatility) - Seasonal decomposition (STL) - Prophet, LSTM extensions - Отдельная задача (ContentBlock)
15. Dimensionality Reduction Advanced — ЧАСТИЧНО ЗАПОЛНЕНО¶
Добавлено в materials.md section 17: - PCA vs t-SNE vs UMAP comparison table - t-SNE algorithm explanation (perplexity, KL divergence) - UMAP algorithm explanation (fuzzy simplicial sets) - Python implementation (all 3 methods) - Decision guide flowchart - Interview questions (5 Q&A) - Common pitfalls
Источники: Medium PCA vs t-SNE vs UMAP (2025), Distill.pub t-SNE guide, UMAP paper
Осталось: - Isomap, LLE (older methods) - Autoencoders for DR - Отдельная задача (ContentBlock)
16. Causal Inference Basics — ЧАСТИЧНО ЗАПОЛНЕНО¶
Добавлено в materials.md section 12: - Association vs Causation - Potential Outcomes Framework - ATE, ATT, ATC definitions - Confounding & DAGs - Methods: PSM, RDD, IV, DiD - Key assumptions (SUTVA, Ignorability, Positivity) - Interview questions (8 Q&A)
Источники: InterviewGemini 2025, Causal Inference Mixtape
Осталось: - Do-calculus advanced - Отдельная задача (ContentBlock) с кодом
Рекомендации по заполнению GAPS¶
Priority 1 (Добавить ASAP)¶
| Gap | Сложность | Задача для создания |
|---|---|---|
| Matrix Calculus | Medium | calc_006_matrix_derivatives |
| Log-sum-exp | Easy | linalg_008_logsumexp |
| Multiple Testing | Medium | stat_016_multiple_testing |
| Constrained Optimization | Hard | calc_007_lagrange |
Priority 2 (Полезно для Senior+)¶
| Gap | Сложность | Задача для создания |
|---|---|---|
| VC Dimension | Hard | theory_001_vc_dimension |
| Sampling Methods | Hard | prob_004_mcmc |
| Scaling Laws | Medium | theory_002_scaling_laws |
Priority 3 (Nice to have)¶
| Gap | Сложность | Задача для создания |
|---|---|---|
| Time Series | Medium | stat_017_time_series |
| Dimensionality Reduction | Medium | linalg_009_tsne_umap |
| Causal Inference | Medium | stat_018_causal |
Cross-References Missing¶
Связи, которые стоит добавить в существующие задачи:
info_002_cross_entropy→ связать сdl_005_loss_functionsstat_010_bias_variance→ связать сensemble_001_random_forestcalc_004_activation_derivatives→ связать сnn_001_backproplinalg_007_pca_svd→ связать сstat_011_kfold_cv(dimensionality)
Итоговый Coverage Assessment¶
ML Math текущий coverage: ~88% для ML Engineer, ~75% для Senior+ positions
Главные пробелы (после итераций 5-17): 1. Measure Theory (LOW priority — only for research positions) 2. Convergence proofs (advanced theory)
Рекомендация: ML Math coverage практически завершён. Добавить 0-1 новых задач для ~90% coverage.