ML Cross-Topic Map (Layer 5)¶

~3 минуты чтения

Prerequisites, connections, leads-to relationships Ментальная карта всех 93 задач Обновлено: 2026-02-11

Dependency Graph Overview¶

                        ML MATH (Foundation)
                              │
         ┌────────────────────┼────────────────────┐
         │                    │                    │
    Calculus             Linear Alg            Statistics
         │                    │                    │
         └────────────────────┼────────────────────┘
                              │
                              ▼
                     CLASSICAL ML
                              │
         ┌────────────────────┼────────────────────┐
         │                    │                    │
   Supervised           Unsupervised           Ensembles
         │                    │                    │
         └────────────────────┼────────────────────┘
                              │
                              ▼
                     DEEP LEARNING
                              │
         ┌────────────────────┼────────────────────┐
         │                    │                    │
    Foundations           CNN/CV               NLP/RNN
         │                    │                    │
         └────────────────────┼────────────────────┘
                              │
                              ▼
                   LLM ENGINEERING
                              │
         ┌────────────────────┼────────────────────┐
         │                    │                    │
   Tokenization             RAG               Fine-tuning
         │                    │                    │
         └────────────────────┼────────────────────┘
                              │
                              ▼
                   ML SYSTEM DESIGN
                              │
         ┌────────────────────┼────────────────────┐
         │                    │                    │
     Serving              Monitoring              RecSys
         │                    │                    │
         └────────────────────┼────────────────────┘
                              │
                              ▼
                     AI AGENTS

Topic Connections Matrix¶

Calculus Dependencies¶

Topic	Requires Calculus	Why
Backpropagation	YES	Chain rule, gradients
Optimizers	YES	Gradient descent variants
Loss Functions	YES	Derivatives for optimization
Regularization	PARTIAL	L1/L2 gradient understanding

Linear Algebra Dependencies¶

Topic	Requires LinAlg	Why
PCA/SVD	YES	Eigendecomposition
Attention	YES	Matrix operations, Q/K/V
Softmax	YES	Vector normalization
Batch Normalization	PARTIAL	Mean/variance computation

Statistics Dependencies¶

Topic	Requires Stats	Why
A/B Testing	YES	Hypothesis testing, p-values
Drift Detection	YES	PSI, KS-test
Classification Metrics	YES	Probability, distributions
Calibration	YES	Probability calibration

Prerequisites Map¶

ML Math → Classical ML¶

Linear Regression:
  ← Derivatives (calculus)
  ← Matrix operations (linalg)
  ← Mean/variance (stats)

Logistic Regression:
  ← Sigmoid derivative (calculus)
  ← Cross-entropy (info theory)
  ← Probability (stats)

Decision Trees:
  ← Entropy/Gini (info theory)
  ← Information Gain

K-Means:
  ← Euclidean distance (linalg)
  ← Mean computation (stats)

Classical ML → Deep Learning¶

Neural Networks:
  ← Logistic Regression (as single neuron)
  ← Gradient descent
  ← Regularization concepts

Backpropagation:
  ← Chain rule (calculus)
  ← Matrix multiplication (linalg)

Optimizers:
  ← Gradient descent (classical ML)
  ← Momentum concepts (physics/calculus)

Deep Learning → LLM Engineering¶

Tokenization:
  ← Text Processing (vocabulary construction)
  ← Embeddings (DL)

Attention:
  ← Softmax (DL)
  ← Matrix multiplication (linalg)
  ← Backpropagation (DL)

LoRA:
  ← Matrix factorization (linalg)
  ← Fine-tuning concepts (DL)

RAG:
  ← Embeddings (DL)
  ← Similarity metrics (linalg)
  ← Dense retrieval

LLM Engineering → AI Agents¶

ReAct Pattern:
  ← Prompt Engineering (LLM)
  ← Tool use concepts
  ← Chain-of-thought (LLM)

Multi-Agent:
  ← ReAct foundation
  ← Orchestration patterns
  ← Communication protocols

Cross-Topic Questions & Answers¶

A: BatchNorm normalizes activations, which: 1. Smooths the loss landscape → larger learning rates possible 2. Reduces internal covariate shift → faster convergence 3. Adds noise (from batch statistics) → regularization effect

Connection: Both improve training stability.

A: Attention computes similarity scores between Q and K, then uses Softmax to: 1. Convert scores to probabilities (sum to 1) 2. Determine how much each value contributes 3. Enable differentiable attention weights

Connection: Softmax is the normalization layer in attention.

A: LoRA decomposes weight update W' = W + BA where: - B: d × r matrix - A: r × d matrix - r << d (low rank)

Connection: Same concept as SVD/PCA — representing large matrix with low-rank factors.

A: A/B testing is applied hypothesis testing: 1. H0: No difference between A and B 2. Collect data from both variants 3. Compute p-value using statistical tests 4. Reject H0 if p < alpha (usually 0.05)

Connection: A/B testing = real-world application of statistical hypothesis testing.

A: KS-test compares two distributions: 1. Compute CDF of baseline and production data 2. KS statistic = max difference between CDFs 3. High KS = significant distribution shift

Connection: KS-test is one method for detecting data drift.

A: RAG = Neural Information Retrieval: 1. Traditional IR: TF-IDF, BM25 (keyword-based) 2. Neural IR: Dense embeddings, semantic similarity 3. RAG: Retrieve relevant docs → LLM generates answer

Connection: RAG extends IR with neural retrieval and generation.

A: Calibration ensures predicted probabilities match empirical frequencies: - If model predicts 70% confidence → should be correct 70% of the time - Platt scaling, Isotonic regression adjust raw scores

Connection: Calibration makes probability outputs "honest."

Concept Clusters¶

Cluster 1: Gradient-Based Learning¶

┌─────────────────────────────────────────┐
│           GRADIENT ECOSYSTEM             │
│                                         │
│  Calculus ──────► Gradient Descent      │
│     │                   │               │
│     │                   ▼               │
│     │            Backpropagation        │
│     │                   │               │
│     │         ┌────────┴────────┐       │
│     │         ▼                 ▼       │
│     │    Optimizers        Loss Funcs   │
│     │    (Adam, SGD)       (CE, MSE)    │
│     │                                   │
│     └──► Chain Rule (connects all)      │
└─────────────────────────────────────────┘

Cluster 2: Probability & Uncertainty¶

┌─────────────────────────────────────────┐
│         PROBABILITY CLUSTER              │
│                                         │
│  Softmax ──────► Classification         │
│     │               │                   │
│     │               ▼                   │
│     │         Calibration              │
│     │               │                   │
│     │         ┌─────┴─────┐             │
│     │         ▼           ▼             │
│     │    Platt        Isotonic          │
│     │   Scaling      Regression         │
│     │                                   │
│     └──► Brier Score (evaluation)       │
└─────────────────────────────────────────┘

Cluster 3: Attention & Transformers¶

┌─────────────────────────────────────────┐
│        ATTENTION ECOSYSTEM               │
│                                         │
│  Q/K/V ────────► Self-Attention         │
│     │               │                   │
│     │               ▼                   │
│     │         Multi-Head               │
│     │               │                   │
│     │         ┌─────┴─────┐             │
│     │         ▼           ▼             │
│     │   Positional    Cross-Att        │
│     │   Encoding     (Decoder)         │
│     │                                   │
│     └──► RoPE (Rotary Position)        │
└─────────────────────────────────────────┘

Cluster 4: Production ML¶

┌─────────────────────────────────────────┐
│         PRODUCTION CLUSTER               │
│                                         │
│  Serving ───────► Latency Opt           │
│     │               │                   │
│     │               ▼                   │
│     │         Monitoring               │
│     │               │                   │
│     │         ┌─────┴─────┐             │
│     │         ▼           ▼             │
│     │    Drift        A/B Testing      │
│     │   Detection                     │
│     │                                   │
│     └──► Feature Stores (connects all)  │
└─────────────────────────────────────────┘

Learning Path Recommendations¶

Path 1: ML Engineer Foundation¶

1. Linear Algebra + Calculus
      ↓
2. Statistics + Probability
      ↓
3. Classical ML (Linear/Logistic Reg, Trees)
      ↓
4. Deep Learning Basics
      ↓
5. ML System Design
      ↓
6. MLOps + Production

Path 2: LLM Engineer¶

1. Deep Learning (Backprop, Attention)
      ↓
2. NLP Basics (Tokenization, Embeddings)
      ↓
3. LLM Architecture (Transformers)
      ↓
4. RAG + Fine-tuning (LoRA)
      ↓
5. LLM Production (Serving, Guardrails)
      ↓
6. AI Agents (ReAct, Multi-Agent)

Path 3: ML System Design¶

1. Statistics (A/B Testing)
      ↓
2. ML Fundamentals
      ↓
3. Model Serving + Optimization
      ↓
4. Monitoring + Drift
      ↓
5. RecSys / Ranking
      ↓
6. LLM Production Systems

Topic Difficulty Rating¶

Topic	Math	Coding	System Design
Linear Regression	★☆☆	★★☆	★☆☆
Neural Networks	★★★	★★★	★★☆
Attention	★★★	★★★	★★☆
RAG	★★☆	★★★	★★★
LoRA	★★★	★★☆	★★☆
A/B Testing	★★★	★★☆	★★☆
Drift Detection	★★☆	★★☆	★★★
Model Serving	★☆☆	★★★	★★★
AI Agents	★★☆	★★★	★★★

Use this map to understand topic dependencies and plan learning paths.