Перейти к содержанию

ML Cross-Topic Map (Layer 5)

~3 минуты чтения

Prerequisites, connections, leads-to relationships Ментальная карта всех 93 задач Обновлено: 2026-02-11


Dependency Graph Overview

                        ML MATH (Foundation)
         ┌────────────────────┼────────────────────┐
         │                    │                    │
    Calculus             Linear Alg            Statistics
         │                    │                    │
         └────────────────────┼────────────────────┘
                     CLASSICAL ML
         ┌────────────────────┼────────────────────┐
         │                    │                    │
   Supervised           Unsupervised           Ensembles
         │                    │                    │
         └────────────────────┼────────────────────┘
                     DEEP LEARNING
         ┌────────────────────┼────────────────────┐
         │                    │                    │
    Foundations           CNN/CV               NLP/RNN
         │                    │                    │
         └────────────────────┼────────────────────┘
                   LLM ENGINEERING
         ┌────────────────────┼────────────────────┐
         │                    │                    │
   Tokenization             RAG               Fine-tuning
         │                    │                    │
         └────────────────────┼────────────────────┘
                   ML SYSTEM DESIGN
         ┌────────────────────┼────────────────────┐
         │                    │                    │
     Serving              Monitoring              RecSys
         │                    │                    │
         └────────────────────┼────────────────────┘
                     AI AGENTS

Topic Connections Matrix

Calculus Dependencies

Topic Requires Calculus Why
Backpropagation YES Chain rule, gradients
Optimizers YES Gradient descent variants
Loss Functions YES Derivatives for optimization
Regularization PARTIAL L1/L2 gradient understanding

Linear Algebra Dependencies

Topic Requires LinAlg Why
PCA/SVD YES Eigendecomposition
Attention YES Matrix operations, Q/K/V
Softmax YES Vector normalization
Batch Normalization PARTIAL Mean/variance computation

Statistics Dependencies

Topic Requires Stats Why
A/B Testing YES Hypothesis testing, p-values
Drift Detection YES PSI, KS-test
Classification Metrics YES Probability, distributions
Calibration YES Probability calibration

Prerequisites Map

ML Math → Classical ML

Linear Regression:
  ← Derivatives (calculus)
  ← Matrix operations (linalg)
  ← Mean/variance (stats)

Logistic Regression:
  ← Sigmoid derivative (calculus)
  ← Cross-entropy (info theory)
  ← Probability (stats)

Decision Trees:
  ← Entropy/Gini (info theory)
  ← Information Gain

K-Means:
  ← Euclidean distance (linalg)
  ← Mean computation (stats)

Classical ML → Deep Learning

Neural Networks:
  ← Logistic Regression (as single neuron)
  ← Gradient descent
  ← Regularization concepts

Backpropagation:
  ← Chain rule (calculus)
  ← Matrix multiplication (linalg)

Optimizers:
  ← Gradient descent (classical ML)
  ← Momentum concepts (physics/calculus)

Deep Learning → LLM Engineering

Tokenization:
  ← Text Processing (vocabulary construction)
  ← Embeddings (DL)

Attention:
  ← Softmax (DL)
  ← Matrix multiplication (linalg)
  ← Backpropagation (DL)

LoRA:
  ← Matrix factorization (linalg)
  ← Fine-tuning concepts (DL)

RAG:
  ← Embeddings (DL)
  ← Similarity metrics (linalg)
  ← Dense retrieval

LLM Engineering → AI Agents

ReAct Pattern:
  ← Prompt Engineering (LLM)
  ← Tool use concepts
  ← Chain-of-thought (LLM)

Multi-Agent:
  ← ReAct foundation
  ← Orchestration patterns
  ← Communication protocols

Cross-Topic Questions & Answers

A: BatchNorm normalizes activations, which: 1. Smooths the loss landscape → larger learning rates possible 2. Reduces internal covariate shift → faster convergence 3. Adds noise (from batch statistics) → regularization effect

Connection: Both improve training stability.

A: Attention computes similarity scores between Q and K, then uses Softmax to: 1. Convert scores to probabilities (sum to 1) 2. Determine how much each value contributes 3. Enable differentiable attention weights

Connection: Softmax is the normalization layer in attention.

A: LoRA decomposes weight update W' = W + BA where: - B: d × r matrix - A: r × d matrix - r << d (low rank)

Connection: Same concept as SVD/PCA — representing large matrix with low-rank factors.

A: A/B testing is applied hypothesis testing: 1. H0: No difference between A and B 2. Collect data from both variants 3. Compute p-value using statistical tests 4. Reject H0 if p < alpha (usually 0.05)

Connection: A/B testing = real-world application of statistical hypothesis testing.

A: KS-test compares two distributions: 1. Compute CDF of baseline and production data 2. KS statistic = max difference between CDFs 3. High KS = significant distribution shift

Connection: KS-test is one method for detecting data drift.

A: RAG = Neural Information Retrieval: 1. Traditional IR: TF-IDF, BM25 (keyword-based) 2. Neural IR: Dense embeddings, semantic similarity 3. RAG: Retrieve relevant docs → LLM generates answer

Connection: RAG extends IR with neural retrieval and generation.

A: Calibration ensures predicted probabilities match empirical frequencies: - If model predicts 70% confidence → should be correct 70% of the time - Platt scaling, Isotonic regression adjust raw scores

Connection: Calibration makes probability outputs "honest."


Concept Clusters

Cluster 1: Gradient-Based Learning

┌─────────────────────────────────────────┐
│           GRADIENT ECOSYSTEM             │
│                                         │
│  Calculus ──────► Gradient Descent      │
│     │                   │               │
│     │                   ▼               │
│     │            Backpropagation        │
│     │                   │               │
│     │         ┌────────┴────────┐       │
│     │         ▼                 ▼       │
│     │    Optimizers        Loss Funcs   │
│     │    (Adam, SGD)       (CE, MSE)    │
│     │                                   │
│     └──► Chain Rule (connects all)      │
└─────────────────────────────────────────┘

Cluster 2: Probability & Uncertainty

┌─────────────────────────────────────────┐
│         PROBABILITY CLUSTER              │
│                                         │
│  Softmax ──────► Classification         │
│     │               │                   │
│     │               ▼                   │
│     │         Calibration              │
│     │               │                   │
│     │         ┌─────┴─────┐             │
│     │         ▼           ▼             │
│     │    Platt        Isotonic          │
│     │   Scaling      Regression         │
│     │                                   │
│     └──► Brier Score (evaluation)       │
└─────────────────────────────────────────┘

Cluster 3: Attention & Transformers

┌─────────────────────────────────────────┐
│        ATTENTION ECOSYSTEM               │
│                                         │
│  Q/K/V ────────► Self-Attention         │
│     │               │                   │
│     │               ▼                   │
│     │         Multi-Head               │
│     │               │                   │
│     │         ┌─────┴─────┐             │
│     │         ▼           ▼             │
│     │   Positional    Cross-Att        │
│     │   Encoding     (Decoder)         │
│     │                                   │
│     └──► RoPE (Rotary Position)        │
└─────────────────────────────────────────┘

Cluster 4: Production ML

┌─────────────────────────────────────────┐
│         PRODUCTION CLUSTER               │
│                                         │
│  Serving ───────► Latency Opt           │
│     │               │                   │
│     │               ▼                   │
│     │         Monitoring               │
│     │               │                   │
│     │         ┌─────┴─────┐             │
│     │         ▼           ▼             │
│     │    Drift        A/B Testing      │
│     │   Detection                     │
│     │                                   │
│     └──► Feature Stores (connects all)  │
└─────────────────────────────────────────┘

Learning Path Recommendations

Path 1: ML Engineer Foundation

1. Linear Algebra + Calculus
2. Statistics + Probability
3. Classical ML (Linear/Logistic Reg, Trees)
4. Deep Learning Basics
5. ML System Design
6. MLOps + Production

Path 2: LLM Engineer

1. Deep Learning (Backprop, Attention)
2. NLP Basics (Tokenization, Embeddings)
3. LLM Architecture (Transformers)
4. RAG + Fine-tuning (LoRA)
5. LLM Production (Serving, Guardrails)
6. AI Agents (ReAct, Multi-Agent)

Path 3: ML System Design

1. Statistics (A/B Testing)
2. ML Fundamentals
3. Model Serving + Optimization
4. Monitoring + Drift
5. RecSys / Ranking
6. LLM Production Systems

Topic Difficulty Rating

Topic Math Coding System Design
Linear Regression ★☆☆ ★★☆ ★☆☆
Neural Networks ★★★ ★★★ ★★☆
Attention ★★★ ★★★ ★★☆
RAG ★★☆ ★★★ ★★★
LoRA ★★★ ★★☆ ★★☆
A/B Testing ★★★ ★★☆ ★★☆
Drift Detection ★★☆ ★★☆ ★★★
Model Serving ★☆☆ ★★★ ★★★
AI Agents ★★☆ ★★★ ★★★

Use this map to understand topic dependencies and plan learning paths.