Перейти к содержанию

ML Math: Пробелы (Gaps)

~7 минут чтения

Что спрашивают на собеседованиях, чего НЕТ в 93 задачах Недопокрытые темы для AI/ML/LLM Engineer Обновлено: 2026-02-11


Текущее покрытие (41 задача)

Подкатегория Задач Покрытие
Calculus 3 Хорошее
Linear Algebra 4 Хорошее
NumPy/Pandas 3 Базовое
Probability 3 Хорошее
Statistics 17 Отличное
Information Theory 4 Хорошее
Classification Metrics 2 Хорошее
Regularization 3 Хорошее
Ensemble 2 Хорошее

КРИТИЧЕСКИЕ GAPS

1. Advanced Optimization Theory — ЧАСТИЧНО ЗАПОЛНЕНО

Добавлено в materials.md section 24: - Newton's method update rule and quadratic convergence - Why Newton doesn't work for deep learning (3 levels of problems) - BFGS: Quasi-Newton with \(O(n^2)\) memory - L-BFGS: Limited-memory version with \(O(n)\) memory - Comparison table (Newton/BFGS/L-BFGS/SGD/Adam) - Python implementation (scipy L-BFGS-B) - Interview questions (5 Q&A)

Источники: Dang Truong Blog (2025), GeeksforGeeks, NumberAnalytics, EPFL OptML course

Осталось: - Convergence proofs for SGD/Adam (advanced) - Learning rate theory (why \(1/\sqrt{t}\) decay) - Отдельная задача (ContentBlock)

2. Measure Theory Basics (НЕТ)

Что спрашивают на Senior+ позициях: - Probability as measure - Radon-Nikodym derivative - KL divergence = measure difference

Приоритет: LOW (только research positions)

3. VC Dimension & Learning Theory — ЧАСТИЧНО ЗАПОЛНЕНО

Добавлено в materials.md: - PAC learning framework - VC dimension definition and bounds - Generalization bound formula - Trade-off diagram - Interview questions

Источники: Cole Mei blog 2025, Understanding ML book

Осталось: - Rademacher complexity (advanced) - Отдельная задача (ContentBlock)

4. Matrix Calculus — ЧАСТИЧНО ЗАПОЛНЕНО

Добавлено в materials.md section 15: - Jacobian matrix definition and examples - Hessian matrix and critical point classification - Comparison table (Gradient vs Jacobian vs Hessian) - Trace tricks for matrix derivatives - Python implementation (numerical_jacobian, numerical_hessian) - Applications in Neural Networks - Interview questions (Newton's method, conditioning, VJP)

Источники: DataCamp Hessian Tutorial, GeeksforGeeks

Осталось: - Hessian-vector products (efficient computation) - Advanced trace tricks - Отдельная задача (ContentBlock)

5. Sampling Methods — ЧАСТИЧНО ЗАПОЛНЕНО

Добавлено в materials.md section 13: - Metropolis-Hastings algorithm with code - Gibbs Sampling, HMC, Slice Sampling comparison - Convergence diagnostics (Trace plots, ACF, ESS, Gelman-Rubin) - Acceptance rate guidelines - Interview questions

Источники: NumberAnalytics MCMC guide, MCMC Handbook

Осталось: - Rejection/Importance sampling details - Langevin dynamics - Отдельная задача (ContentBlock)


СРЕДНИЕ GAPS

6. Information Geometry — ЧАСТИЧНО ЗАПОЛНЕНО

Добавлено в materials.md section 22: - Fisher Information Matrix definition and properties - Natural gradient formula and intuition - Approximations (diagonal, K-FAC, low-rank) - Adam as approximate natural gradient - Python implementation (empirical Fisher) - Interview questions (5 Q&A)

Источники: Ji-Ha Kim Blog (2025), Satyam Mishra Substack (2025), AI Under the Hood

Осталось: - Dually flat spaces (advanced) - Amari's alpha-connections - Отдельная задача (ContentBlock)

7. Constrained Optimization — ЧАСТИЧНО ЗАПОЛНЕНО

Добавлено в materials.md section 18: - General optimization problem formulation - Lagrange multipliers (equality constraints) - KKT conditions table (4 conditions) - Complementary slackness intuition - SVM dual formulation derivation - Python implementation (scipy.optimize) - Convexity matters table - Interview questions (5 Q&A)

Источники: NumberAnalytics KKT Guide, CircuitLabs Convex Optimization (2025), Boyd book

Осталось: - Duality theory (weak vs strong duality) - Advanced SVM formulations (soft margin, kernel trick) - Отдельная задача (ContentBlock)

8. Numerical Stability — ЧАСТИЧНО ЗАПОЛНЕНО

Добавлено в materials.md section 16: - Log-sum-exp trick derivation and code - Softmax numerical stability (stable vs unstable) - Gradient clipping methods comparison table - Gradient clipping PyTorch code (3 methods) - Common numerical issues table - FP16/BF16 mixed precision considerations - Interview questions (6 Q&A)

Источники: Best AI Tools Softmax Guide, GeeksforGeeks Gradient Clipping

Осталось: - Advanced FP16/BF16 techniques (loss scaling) - Отдельная задача (ContentBlock)

9. Random Projections — ЧАСТИЧНО ЗАПОЛНЕНО

Добавлено в materials.md section 23: - Johnson-Lindenstrauss lemma statement - Target dimension formula \(k \geq 8\ln n / \varepsilon^2\) - Random projection methods (Gaussian, Rademacher, Sparse/Achlioptas) - Python implementation (from scratch + sklearn) - JL vs PCA comparison table - Interview questions (5 Q&A)

Источники: AI Under the Hood, NumberAnalytics, Achlioptas (2003)

Осталось: - Subspace embeddings (advanced) - Feature hashing (hashing trick) - Отдельная задача (ContentBlock)

10. Hypothesis Testing Advanced — ЧАСТИЧНО ЗАПОЛНЕНО

Добавлено в materials.md section 19: - Multiple testing problem (probability table) - FWER vs FDR comparison - Bonferroni correction formula - Benjamini-Hochberg algorithm with example - Comparison table (Bonferroni/Holm/Dunnett/BH) - Python implementation (from scratch + statsmodels) - Decision guide flowchart - Interview questions (5 Q&A)

Источники: Statsig Blog (2025), GeeksforGeeks BH Procedure

Осталось: - Permutation tests - Sequential testing (alpha-spending) - Bayesian A/B testing - Отдельная задача (ContentBlock)


НОВЫЕ ТЕМЫ 2025-2026

11. Test-Time Compute — ЧАСТИЧНО ЗАПОЛНЕНО

Тренд 2025: Spending more compute at inference for better outputs.

Добавлено в materials.md section 21: - Test-time vs train-time compute comparison - Methods: CoT, Best-of-N, majority voting, "wait" tokens - Budget forcing and process reward models - Python code examples (budget_forcing, best_of_n_with_verifier) - Interview questions (5 Q&A)

Источники: Sebastian Raschka Blog (2025), arXiv scaling papers

Осталось: - Compute-optimal inference theory - Verification scaling formal models - Отдельная задача (ContentBlock)

12. Scaling Laws — ЧАСТИЧНО ЗАПОЛНЕНО

Добавлено в materials.md: - Kaplan (2020) first scaling laws - Chinchilla (2022) compute-optimal formula - Beyond Chinchilla (2025) inference costs - Densing Law, Sloth (2025) - Interview questions and code examples

Источники: AIMultiple Feb 2026, arXiv papers

Осталось: - Отдельная задача (ContentBlock) с расчётами

13. Spectral Methods — ЧАСТИЧНО ЗАПОЛНЕНО

Добавлено в materials.md section 20: - Spectral clustering overview and use cases - Graph construction methods (ε-neighborhood, k-NN, fully connected) - Graph Laplacian variants table (L, L_sym, L_rw) - Spectral clustering algorithm (5 steps) - Python implementation with make_moons comparison - Eigenvalue interpretation - K-Means vs Spectral comparison table - Interview questions (5 Q&A)

Источники: GeeksforGeeks Spectral Clustering (2025), Von Luxburg Tutorial

Осталось: - Advanced graph signal processing - Spectral graph neural networks - Отдельная задача (ContentBlock)


Практические Gaps

14. Time Series Statistics — ЧАСТИЧНО ЗАПОЛНЕНО

Добавлено в materials.md section 14: - Stationarity definition and tests (ADF, KPSS) - ARIMA(p,d,q) model equations - ACF/PACF interpretation for model selection - Cointegration concept - Common pitfalls - Interview questions

Источники: BagelQuant Quant Interview FAQ 2025, Hyndman FPP3

Осталось: - GARCH models (volatility) - Seasonal decomposition (STL) - Prophet, LSTM extensions - Отдельная задача (ContentBlock)

15. Dimensionality Reduction Advanced — ЧАСТИЧНО ЗАПОЛНЕНО

Добавлено в materials.md section 17: - PCA vs t-SNE vs UMAP comparison table - t-SNE algorithm explanation (perplexity, KL divergence) - UMAP algorithm explanation (fuzzy simplicial sets) - Python implementation (all 3 methods) - Decision guide flowchart - Interview questions (5 Q&A) - Common pitfalls

Источники: Medium PCA vs t-SNE vs UMAP (2025), Distill.pub t-SNE guide, UMAP paper

Осталось: - Isomap, LLE (older methods) - Autoencoders for DR - Отдельная задача (ContentBlock)

16. Causal Inference Basics — ЧАСТИЧНО ЗАПОЛНЕНО

Добавлено в materials.md section 12: - Association vs Causation - Potential Outcomes Framework - ATE, ATT, ATC definitions - Confounding & DAGs - Methods: PSM, RDD, IV, DiD - Key assumptions (SUTVA, Ignorability, Positivity) - Interview questions (8 Q&A)

Источники: InterviewGemini 2025, Causal Inference Mixtape

Осталось: - Do-calculus advanced - Отдельная задача (ContentBlock) с кодом


Рекомендации по заполнению GAPS

Priority 1 (Добавить ASAP)

Gap Сложность Задача для создания
Matrix Calculus Medium calc_006_matrix_derivatives
Log-sum-exp Easy linalg_008_logsumexp
Multiple Testing Medium stat_016_multiple_testing
Constrained Optimization Hard calc_007_lagrange

Priority 2 (Полезно для Senior+)

Gap Сложность Задача для создания
VC Dimension Hard theory_001_vc_dimension
Sampling Methods Hard prob_004_mcmc
Scaling Laws Medium theory_002_scaling_laws

Priority 3 (Nice to have)

Gap Сложность Задача для создания
Time Series Medium stat_017_time_series
Dimensionality Reduction Medium linalg_009_tsne_umap
Causal Inference Medium stat_018_causal

Cross-References Missing

Связи, которые стоит добавить в существующие задачи:

  1. info_002_cross_entropy → связать с dl_005_loss_functions
  2. stat_010_bias_variance → связать с ensemble_001_random_forest
  3. calc_004_activation_derivatives → связать с nn_001_backprop
  4. linalg_007_pca_svd → связать с stat_011_kfold_cv (dimensionality)

Итоговый Coverage Assessment

ML Math текущий coverage: ~88% для ML Engineer, ~75% для Senior+ positions

Главные пробелы (после итераций 5-17): 1. Measure Theory (LOW priority — only for research positions) 2. Convergence proofs (advanced theory)

Рекомендация: ML Math coverage практически завершён. Добавить 0-1 новых задач для ~90% coverage.