ML Cross-Topic Map (Layer 5)¶
~3 минуты чтения
Prerequisites, connections, leads-to relationships Ментальная карта всех 93 задач Обновлено: 2026-02-11
Dependency Graph Overview¶
ML MATH (Foundation)
│
┌────────────────────┼────────────────────┐
│ │ │
Calculus Linear Alg Statistics
│ │ │
└────────────────────┼────────────────────┘
│
▼
CLASSICAL ML
│
┌────────────────────┼────────────────────┐
│ │ │
Supervised Unsupervised Ensembles
│ │ │
└────────────────────┼────────────────────┘
│
▼
DEEP LEARNING
│
┌────────────────────┼────────────────────┐
│ │ │
Foundations CNN/CV NLP/RNN
│ │ │
└────────────────────┼────────────────────┘
│
▼
LLM ENGINEERING
│
┌────────────────────┼────────────────────┐
│ │ │
Tokenization RAG Fine-tuning
│ │ │
└────────────────────┼────────────────────┘
│
▼
ML SYSTEM DESIGN
│
┌────────────────────┼────────────────────┐
│ │ │
Serving Monitoring RecSys
│ │ │
└────────────────────┼────────────────────┘
│
▼
AI AGENTS
Topic Connections Matrix¶
Calculus Dependencies¶
| Topic | Requires Calculus | Why |
|---|---|---|
| Backpropagation | YES | Chain rule, gradients |
| Optimizers | YES | Gradient descent variants |
| Loss Functions | YES | Derivatives for optimization |
| Regularization | PARTIAL | L1/L2 gradient understanding |
Linear Algebra Dependencies¶
| Topic | Requires LinAlg | Why |
|---|---|---|
| PCA/SVD | YES | Eigendecomposition |
| Attention | YES | Matrix operations, Q/K/V |
| Softmax | YES | Vector normalization |
| Batch Normalization | PARTIAL | Mean/variance computation |
Statistics Dependencies¶
| Topic | Requires Stats | Why |
|---|---|---|
| A/B Testing | YES | Hypothesis testing, p-values |
| Drift Detection | YES | PSI, KS-test |
| Classification Metrics | YES | Probability, distributions |
| Calibration | YES | Probability calibration |
Prerequisites Map¶
ML Math → Classical ML¶
Linear Regression:
← Derivatives (calculus)
← Matrix operations (linalg)
← Mean/variance (stats)
Logistic Regression:
← Sigmoid derivative (calculus)
← Cross-entropy (info theory)
← Probability (stats)
Decision Trees:
← Entropy/Gini (info theory)
← Information Gain
K-Means:
← Euclidean distance (linalg)
← Mean computation (stats)
Classical ML → Deep Learning¶
Neural Networks:
← Logistic Regression (as single neuron)
← Gradient descent
← Regularization concepts
Backpropagation:
← Chain rule (calculus)
← Matrix multiplication (linalg)
Optimizers:
← Gradient descent (classical ML)
← Momentum concepts (physics/calculus)
Deep Learning → LLM Engineering¶
Tokenization:
← Text Processing (vocabulary construction)
← Embeddings (DL)
Attention:
← Softmax (DL)
← Matrix multiplication (linalg)
← Backpropagation (DL)
LoRA:
← Matrix factorization (linalg)
← Fine-tuning concepts (DL)
RAG:
← Embeddings (DL)
← Similarity metrics (linalg)
← Dense retrieval
LLM Engineering → AI Agents¶
ReAct Pattern:
← Prompt Engineering (LLM)
← Tool use concepts
← Chain-of-thought (LLM)
Multi-Agent:
← ReAct foundation
← Orchestration patterns
← Communication protocols
Cross-Topic Questions & Answers¶
Q: "How is Batch Normalization related to Gradient Descent?"¶
A: BatchNorm normalizes activations, which: 1. Smooths the loss landscape → larger learning rates possible 2. Reduces internal covariate shift → faster convergence 3. Adds noise (from batch statistics) → regularization effect
Connection: Both improve training stability.
Q: "How is Attention related to Softmax?"¶
A: Attention computes similarity scores between Q and K, then uses Softmax to: 1. Convert scores to probabilities (sum to 1) 2. Determine how much each value contributes 3. Enable differentiable attention weights
Connection: Softmax is the normalization layer in attention.
Q: "How is LoRA related to Matrix Factorization?"¶
A: LoRA decomposes weight update W' = W + BA where: - B: d × r matrix - A: r × d matrix - r << d (low rank)
Connection: Same concept as SVD/PCA — representing large matrix with low-rank factors.
Q: "How is A/B Testing related to Hypothesis Testing?"¶
A: A/B testing is applied hypothesis testing: 1. H0: No difference between A and B 2. Collect data from both variants 3. Compute p-value using statistical tests 4. Reject H0 if p < alpha (usually 0.05)
Connection: A/B testing = real-world application of statistical hypothesis testing.
Q: "How is Drift Detection related to KS-test?"¶
A: KS-test compares two distributions: 1. Compute CDF of baseline and production data 2. KS statistic = max difference between CDFs 3. High KS = significant distribution shift
Connection: KS-test is one method for detecting data drift.
Q: "How is RAG related to Information Retrieval?"¶
A: RAG = Neural Information Retrieval: 1. Traditional IR: TF-IDF, BM25 (keyword-based) 2. Neural IR: Dense embeddings, semantic similarity 3. RAG: Retrieve relevant docs → LLM generates answer
Connection: RAG extends IR with neural retrieval and generation.
Q: "How is Calibration related to Probability?"¶
A: Calibration ensures predicted probabilities match empirical frequencies: - If model predicts 70% confidence → should be correct 70% of the time - Platt scaling, Isotonic regression adjust raw scores
Connection: Calibration makes probability outputs "honest."
Concept Clusters¶
Cluster 1: Gradient-Based Learning¶
┌─────────────────────────────────────────┐
│ GRADIENT ECOSYSTEM │
│ │
│ Calculus ──────► Gradient Descent │
│ │ │ │
│ │ ▼ │
│ │ Backpropagation │
│ │ │ │
│ │ ┌────────┴────────┐ │
│ │ ▼ ▼ │
│ │ Optimizers Loss Funcs │
│ │ (Adam, SGD) (CE, MSE) │
│ │ │
│ └──► Chain Rule (connects all) │
└─────────────────────────────────────────┘
Cluster 2: Probability & Uncertainty¶
┌─────────────────────────────────────────┐
│ PROBABILITY CLUSTER │
│ │
│ Softmax ──────► Classification │
│ │ │ │
│ │ ▼ │
│ │ Calibration │
│ │ │ │
│ │ ┌─────┴─────┐ │
│ │ ▼ ▼ │
│ │ Platt Isotonic │
│ │ Scaling Regression │
│ │ │
│ └──► Brier Score (evaluation) │
└─────────────────────────────────────────┘
Cluster 3: Attention & Transformers¶
┌─────────────────────────────────────────┐
│ ATTENTION ECOSYSTEM │
│ │
│ Q/K/V ────────► Self-Attention │
│ │ │ │
│ │ ▼ │
│ │ Multi-Head │
│ │ │ │
│ │ ┌─────┴─────┐ │
│ │ ▼ ▼ │
│ │ Positional Cross-Att │
│ │ Encoding (Decoder) │
│ │ │
│ └──► RoPE (Rotary Position) │
└─────────────────────────────────────────┘
Cluster 4: Production ML¶
┌─────────────────────────────────────────┐
│ PRODUCTION CLUSTER │
│ │
│ Serving ───────► Latency Opt │
│ │ │ │
│ │ ▼ │
│ │ Monitoring │
│ │ │ │
│ │ ┌─────┴─────┐ │
│ │ ▼ ▼ │
│ │ Drift A/B Testing │
│ │ Detection │
│ │ │
│ └──► Feature Stores (connects all) │
└─────────────────────────────────────────┘
Learning Path Recommendations¶
Path 1: ML Engineer Foundation¶
1. Linear Algebra + Calculus
↓
2. Statistics + Probability
↓
3. Classical ML (Linear/Logistic Reg, Trees)
↓
4. Deep Learning Basics
↓
5. ML System Design
↓
6. MLOps + Production
Path 2: LLM Engineer¶
1. Deep Learning (Backprop, Attention)
↓
2. NLP Basics (Tokenization, Embeddings)
↓
3. LLM Architecture (Transformers)
↓
4. RAG + Fine-tuning (LoRA)
↓
5. LLM Production (Serving, Guardrails)
↓
6. AI Agents (ReAct, Multi-Agent)
Path 3: ML System Design¶
1. Statistics (A/B Testing)
↓
2. ML Fundamentals
↓
3. Model Serving + Optimization
↓
4. Monitoring + Drift
↓
5. RecSys / Ranking
↓
6. LLM Production Systems
Topic Difficulty Rating¶
| Topic | Math | Coding | System Design |
|---|---|---|---|
| Linear Regression | ★☆☆ | ★★☆ | ★☆☆ |
| Neural Networks | ★★★ | ★★★ | ★★☆ |
| Attention | ★★★ | ★★★ | ★★☆ |
| RAG | ★★☆ | ★★★ | ★★★ |
| LoRA | ★★★ | ★★☆ | ★★☆ |
| A/B Testing | ★★★ | ★★☆ | ★★☆ |
| Drift Detection | ★★☆ | ★★☆ | ★★★ |
| Model Serving | ★☆☆ | ★★★ | ★★★ |
| AI Agents | ★★☆ | ★★★ | ★★★ |
Use this map to understand topic dependencies and plan learning paths.