MLOps: подготовка к интервью¶

~4 минуты чтения

Предварительно: Учебные материалы MLOps | LLMOps vs MLOps

MLOps-инженер -- одна из самых востребованных позиций 2026: средняя зарплата $165K (Levels.fyi), 47% рост вакансий за год. На интервью проверяют: model serving (FastAPI + A/B), CI/CD для ML (GitHub Actions + MLflow), мониторинг (drift detection, PSI, Evidently AI), model registry, compression (quantization, pruning, distillation). 8 разделов ниже покрывают все ключевые темы с вопросами трёх уровней.

Обновлено: 2026-02-12

1. Model Serving¶

Basic¶

Q: Зачем FastAPI для ML?

A: Async-first, high performance, auto-generated docs (Swagger), Pydantic validation, WebSocket support.

Q: Что такое model serialization?

A: Сохранение модели в файл для loading в production. Форматы: pickle, joblib, ONNX, TorchScript.

Medium¶

Q: Как организовать A/B тестирование моделей?

A: (1) Deterministic routing по user_id, (2) Feature flag для traffic split, (3) Logging model_version, (4) Post-hoc analysis.

Q: Model versioning strategies?

A: (1) MLflow Model Registry, (2) Semantic versioning, (3) Git tags + artifact storage, (4) Container tags.

Killer¶

Q: Спроектируйте high-availability model serving.

A: (1) Multiple replicas, (2) Load balancer, (3) Health checks, (4) Circuit breaker, (5) Graceful degradation, (6) Model caching, (7) Async inference queue.

CI/CD и Deployment¶

Basic¶

Q: Что такое CI/CD для ML?

A: Continuous Integration (автоматические тесты кода и данных) + Continuous Delivery (автоматический deployment моделей). Отличие от software CI/CD: нужно тестировать не только код, но и данные (schema validation, distribution checks) и модели (performance regression, bias testing). Инструменты: GitHub Actions, GitLab CI, Jenkins + MLflow/DVC.

Q: Как версионировать ML модели?

A: (1) MLflow Model Registry — стандарт, stage transitions (staging/production), (2) DVC — version control для данных и моделей, (3) Git tags + artifact storage (S3/GCS), (4) Container image tags (model baked in). Best practice: семантическое версионирование + хэш данных + хэш кода.

Medium¶

Q: Как организовать blue-green deployment для ML?

A: Две идентичные среды: blue (current) и green (new). (1) Deploy новую модель на green, (2) Run smoke tests, (3) Switch traffic (DNS/LB), (4) Rollback если metrics degraded. Отличие от canary: весь трафик переключается сразу, проще rollback. Для ML дополнительно: проверить prediction distribution, latency, и feature pipeline consistency.

Q: Что такое shadow mode deployment?

A: Новая модель получает production traffic, но predictions не используются — только логируются для сравнения с текущей моделью. Преимущества: (1) реальный трафик, (2) нет risk для пользователей, (3) можно сравнить latency, accuracy, distribution. Минус: удвоенная compute cost. Используют перед canary/A/B для high-risk моделей.

Q: Как тестировать ML модели в CI?

A: (1) Unit tests — preprocessing functions, feature engineering, (2) Integration tests — end-to-end pipeline на sample data, (3) Model tests — performance > threshold on holdout, (4) Data tests — schema validation, null checks, distribution drift, (5) Smoke tests — inference на golden examples. Fail CI если любой regression > tolerance.

Monitoring & Observability¶

Basic¶

Q: Какие метрики мониторить для ML в production?

A: (1) System: latency (p50/p95/p99), throughput (RPS), error rate, CPU/RAM, (2) Model: prediction distribution, feature distribution, confidence scores, (3) Business: conversion, engagement, revenue impact. Data drift мониторинг: PSI, KS-test на feature distributions.

Medium¶

Q: Как обнаружить model degradation в production?

A: (1) Delayed ground truth — сравнить predictions с delayed labels (fraud через 30 дней), (2) Prediction drift — если P(y=1) сдвигается, модель деградирует, (3) Feature drift — PSI/KS-test на input features, (4) Proxy metrics — если online metric (CTR) падает при стабильном трафике. Инструменты: Evidently AI, WhyLabs, custom Grafana dashboards.

Q: Что такое data drift и concept drift?

A: Data drift — изменение P(X), распределение фичей сдвинулось (новая демография, сезонность). Concept drift — изменение P(Y|X), связь фичей с таргетом изменилась (пандемия изменила поведение). Data drift детектим через PSI/KS-test на фичах. Concept drift детектим через мониторинг model accuracy на свежих данных. Решение: retrain с новыми данными, адаптивные модели.

Killer¶

Q: Спроектируйте систему мониторинга для 50 ML моделей в production.

A: Architecture: (1) Centralized logging: все predictions -> Kafka -> ClickHouse, (2) Feature store monitoring: freshness, null rate, cardinality per feature, (3) Model performance: scheduled evaluation на delayed ground truth, (4) Alerting tiers: P0 (модель не отвечает), P1 (accuracy drop > 5%), P2 (feature drift detected), (5) Dashboard: Grafana с per-model panels, (6) Auto-retraining trigger: если drift score > threshold, запускаем retraining pipeline, (7) SLA: p99 latency < 100ms, availability > 99.9%.

Experiment Tracking¶

Medium¶

Q: MLflow vs Weights & Biases — когда что использовать?

A: MLflow: open-source, self-hosted, model registry, больше MLOps. W&B: SaaS, лучше визуализация, collaboration, sweep для HPO. Выбор: если self-hosted requirement -> MLflow. Если команда и collaboration -> W&B. Многие используют оба: W&B для экспериментов, MLflow для model registry и deployment.

Q: Как организовать reproducibility ML эксперимента?

A: (1) Зафиксировать код (git commit hash), (2) Зафиксировать данные (DVC, hash dataset), (3) Залогировать hyperparameters (MLflow/W&B), (4) Зафиксировать environment (Docker, requirements.txt с pinned versions), (5) Зафиксировать random seed, (6) Залогировать hardware (GPU type). Всё автоматически через MLflow autolog() или W&B wandb.init().

Data Quality & Validation¶

Basic¶

Q: Что такое data contracts?

A: Формальное соглашение между producer и consumer данных о schema, constraints, и SLA. Контракт определяет: (1) Schema — типы, nullable, uniqueness, (2) Constraints — диапазоны, patterns, referential integrity, (3) Freshness — как часто обновляется, (4) Quality SLA — допустимый % null, duplicates. Реализация: JSON schemas, Protobuf, Great Expectations suites.

Q: Зачем нужен Great Expectations?

A: Data validation framework для автоматической проверки качества данных. Позволяет: (1) Define expectations — правила валидации (expect_column_values_to_be_unique, expect_column_values_to_match_regex), (2) Generate docs — data docs с результатами проверок, (3) Integrate с pipelines — Airflow, Spark, Pandas, (4) Alerting — при нарушении expectations.

Medium¶

Q: Как построить data validation pipeline?

A: (1) Schema validation — типы, nullable, cardinality, (2) Statistical validation — distribution checks, null rate, outlier detection, (3) Business rules — domain-specific constraints (age > 0, email format), (4) Cross-dataset validation — referential integrity, consistency checks. Инструменты: Great Expectations для batch, Deequ для Spark, custom SQL tests. Валидация на: ingestion, preprocessing, pre-training.

Q: Medallion Architecture — что это?

A: Layered data architecture (Databricks/Lakehouse): - Bronze: Raw data as-is, append-only, full history - Silver: Cleaned, deduplicated, validated, enriched - Gold: Aggregated, business-ready, ML-optimized

Валидация происходит на Bronze→Silver transition. Great Expectations проверяет schema, quality, completeness.

Killer¶

Q: Как обработать data quality incident в production?

A: (1) Detect — automated alerts на quality metrics degradation, (2) Triage — определить upstream source, (3) Quarantine — изолировать bad data, не ломать pipeline, (4) Notify — alert data consumers, (5) Root cause — bug в producer, schema change, upstream outage, (6) Fix — patch producer или update contract, (7) Backfill — пересчитать affected downstream, (8) Post-mortem — update validation rules. Prevention: data contracts, schema registry, integration tests.

Experiment Tracking & Model Registry¶

Basic¶

Q: Что логировать в ML эксперименте?

A: - Code: git commit, branch, diff - Data: dataset version, hash, split sizes - Hyperparameters: learning_rate, batch_size, architecture - Metrics: loss, accuracy, per-epoch, per-step - Artifacts: model weights, configs, sample predictions - Environment: Python version, GPU type, library versions - Random seed: для reproducibility

Medium¶

Q: MLflow vs W&B vs Neptune vs DVC — comparison?

Feature	MLflow	W&B	Neptune	DVC
Type	Self-hosted/SaaS	SaaS	SaaS	Open source
Model Registry	✅ Best	Basic	Basic	Via Git
Collaboration	Limited	✅ Strong	Good	Git-based
Visualization	Good	✅ Best	Good	Limited
Data Versioning	No	No	No	✅ Core
Cost	Free/Enterprise	Free tier	Free tier	Free

Use cases: - Solo/Small team: MLflow (free, self-hosted) - Team collaboration: W&B (best UX, collaboration) - Data-heavy: MLflow + DVC (model + data versioning) - Enterprise: Neptune/MLflow Enterprise (compliance, scale)

Q: Что такое model registry и зачем нужен?

A: Central repository для versioned models с metadata. Функции: - Versioning: Каждая model version с уникальным ID - Staging: Stages — None → Staging → Production → Archived - Metadata: Metrics, parameters, lineage, tags - Access control: Кто может promote модели - Lineage: Какой experiment, data, code породил модель

MLflow Model Registry workflow:

# Register model
mlflow.log_model(model, "model", registered_model_name="fraud_detector")

# Transition stage
client = mlflow.tracking.MlflowClient()
client.transition_model_version_stage(
    name="fraud_detector",
    version=3,
    stage="Production"
)

Killer¶

Q: Спроектируйте ML experiment tracking для команды из 20 DS.

A: Requirements: 20 DS, 100+ experiments/day, 10+ models in production, compliance.

Architecture: 1. Centralized MLflow: Self-hosted on Kubernetes, PostgreSQL backend, S3 artifact store 2. Experiment naming convention: {team}/{project}/{experiment_name} 3. Auto-logging: Wrapper around training code для automatic metric/param logging 4. Model registry: All production models через registry, staged deployment 5. CI integration: Experiments logged from CI runs, сравнение с baseline 6. Data versioning: DVC для datasets, linked в MLflow params 7. Dashboards: Grafana dashboards для model performance, experiment counts 8. Alerting: Failed experiments, metric regression vs baseline 9. Cost management: Artifact TTL, auto-cleanup old experiments

Governance: - Model promotion requires 2 reviews - All experiments tagged with business_unit, cost_center - Monthly experiment audit

6. CI/CD Deep Dive for ML¶

Basic¶

Q: Что должно быть в ML CI pipeline?

A: - Code linting: ruff, black, mypy - Unit tests: preprocessing, feature engineering functions - Data validation: schema checks, null rate, distribution - Model training on sample: fast training run для sanity check - Model evaluation: metrics vs baseline - Artifact upload: model to registry, data to DVC

Q: GitHub Actions vs GitLab CI vs Jenkins для ML?

A:

Feature GitHub Actions GitLab CI Jenkins

Setup Easiest Easy Complex

GPU runners Via self-hosted Via self-hosted Yes

Matrix builds ✅ ✅ Limited

Cost Free tier + minutes Free tier Free

ML ecosystem Best (actions) Good Plugins

Recommendation: GitHub Actions для ML teams (best actions ecosystem, easy self-hosted runners на GPU).

Medium¶

Q: Как организовать automated model testing?

A: 3 levels:

Level 1: Behavioral Tests

def test_model_invariance():
     # Model should give same prediction for semantically equivalent inputs
     text1 = "Customer wants refund"
     text2 = "The customer is requesting a refund"
     assert model.predict(text1) == model.predict(text2)

def test_model_directional():
     # Increasing income should not decrease credit_score prediction
     low_income = {"income": 30000}
     high_income = {"income": 100000}
     assert model.predict(high_income) >= model.predict(low_income)

Level 2: Performance Tests

def test_accuracy_threshold():
     metrics = evaluate_model(model, test_data)
     assert metrics["accuracy"] > 0.85, f"Accuracy {metrics['accuracy']} below threshold"
     assert metrics["f1"] > 0.80

Level 3: Comparison Tests

def test_against_baseline():
     baseline = load_model("production/v1")
     new_model = train_model()
     assert evaluate(new_model) >= evaluate(baseline) * 0.98  # Allow 2% regression

Q: Что такое canary deployment для ML?

A: Постепенное переключение трафика на новую модель: - 1% traffic → monitor metrics 1 hour - 5% traffic → monitor metrics 4 hours - 25% traffic → monitor metrics 24 hours - 100% traffic → full rollout

Rollback criteria: - Error rate > baseline + 0.1% - Latency p99 > baseline * 1.5 - Business metric (CTR, conversion) drop > 2%

Implementation: Feature flag (LaunchDarkly), Istio traffic split, Kubernetes Service weights.

Killer¶

Q: Спроектируйте ML CI/CD pipeline для team of 10.

A:

# .github/workflows/ml-pipeline.yml
name: ML Pipeline
on:
  push:
    branches: [main]
  pull_request:

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: ruff check .
      - run: mypy .
      - run: black --check .

  test:
    needs: lint
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pytest tests/unit -m "not gpu"
      - run: pytest tests/integration --sample-data
      - run: great_expectations checkpoint run ml_suite

  train:
    needs: test
    runs-on: [self-hosted, gpu]
    steps:
      - uses: actions/checkout@v4
      - run: dvc pull data
      - run: python train.py --config configs/experiment.yaml
      - run: mlflow models log-model

  evaluate:
    needs: train
    runs-on: ubuntu-latest
    steps:
      - run: pytest tests/model_tests
      - run: python evaluate.py --compare-baseline

  deploy-staging:
    needs: evaluate
    if: github.ref == 'refs/heads/main'
    environment: staging
    runs-on: ubuntu-latest
    steps:
      - run: echo "Deploy to staging + shadow mode tests"

  deploy-production:
    needs: deploy-staging
    environment:
      name: production
    runs-on: ubuntu-latest
    steps:
      - run: echo "Canary: 1% -> 5% -> 25% -> 100%"
      # Auto-rollback on metric degradation

7. Model Registry Deep Dive¶

Basic¶

Q: Зачем нужен model lineage?

A: Model lineage отвечает на вопросы: - Какие данные использованы для обучения? - Какой код/commit породил модель? - Какие hyperparameters? - Кто и когда обучал? - Какая была baseline метрика?

Implementation: MLflow автоматически links run_id -> model_version -> artifacts -> params/metrics.

Q: Как организовать model rollback?

A:

# Option 1: Promote previous version
client.transition_model_version_stage(
    name="fraud_detector",
    version=previous_version,
    stage="Production",
    archive_existing_versions=True
)

# Option 2: Load specific version
model = mlflow.pyfunc.load_model(f"models:/fraud_detector/production")
# или models:/fraud_detector/versions/3

Medium¶

Q: Как организовать A/B тестирование через model registry?

A:

# Router based on user segment
def predict(user_id, features):
    if is_in_experiment(user_id, "model_v2"):
        model = mlflow.pyfunc.load_model("models:/fraud_detector/versions/5")
    else:
        model = mlflow.pyfunc.load_model("models:/fraud_detector/versions/4")

    prediction = model.predict(features)
    log_prediction(user_id, prediction, model_version)
    return prediction

Registry integration: - Version 4 = "Production" (control) - Version 5 = "Staging" (treatment) - Traffic split via feature flag

Killer¶

Q: Спроектируйте model governance для regulated industry (finance/healthcare).

A:

Governance requirements: 1. Audit trail: Кто, когда, почему promoted модель 2. Approval workflow: Min 2 approvals для production 3. Documentation: Model card, data lineage, fairness report 4. Retention: Keep all versions + training data for 7 years 5. Explainability: SHAP values для all predictions

Implementation:

# Model promotion workflow
def promote_model(model_name, version, approvers):
    # 1. Check all requirements
    assert len(approvers) >= 2, "Need 2 approvals"
    assert model_has_documentation(model_name, version)
    assert fairness_check_passed(model_name, version)
    assert explainability_enabled(model_name, version)

    # 2. Log audit event
    audit_log.record({
        "action": "promote",
        "model": model_name,
        "version": version,
        "approvers": approvers,
        "timestamp": now(),
        "justification": get_justification(),
    })

    # 3. Transition
    client.transition_model_version_stage(
        name=model_name, version=version, stage="Production"
    )

8. Model Compression for Production¶

Basic¶

Q: Какие техники model compression используются в production?

A:

Technique Speedup Size Reduction Quality Loss Best For

Quantization 2-4x 2-4x Low (1-3%) Most production models

Pruning 2-5x 3-10x Medium (2-5%) Overparameterized models

Distillation 2-10x 2-50x Low if well-tuned Teacher-student scenarios

Low-rank 1.5-2x 2-4x Low Linear layers

Типичный pipeline: Pruning → QAT (Quantization-Aware Training) → Fine-tune

Q: Что такое Quantization и когда использовать?

A:

Quantization — уменьшение precision весов: FP32 → FP16/INT8/INT4.

Типы: - Post-Training Quantization (PTQ): Fast, no retraining, ~2-3% accuracy loss - Quantization-Aware Training (QAT): Slower, better accuracy, ~1% loss

FP16 vs INT8 vs INT4: | Precision | Memory | Speedup | Quality | Use Case | |-----------|--------|---------|---------|----------| | FP32 | 4 bytes | 1x | 100% | Training | | FP16 | 2 bytes | 2x | 99.5% | Training/Inference | | INT8 | 1 byte | 4x | 97-99% | Production inference | | INT4 | 0.5 byte | 8x | 90-95% | Edge/Extreme constraints |

Когда использовать: - Edge deployment → INT8/INT4 - Cloud inference → FP16/INT8 - Training → FP32/FP16 (mixed precision)

Medium¶

Q: Как реализовать PTQ для PyTorch модели?

A:

import torch
from torch.quantization import quantize_dynamic

# Dynamic quantization (easiest, works for most models)
model_quantized = quantize_dynamic(
    model,
    {torch.nn.Linear, torch.nn.LSTM, torch.nn.GRU},
    dtype=torch.qint8
)

# Static quantization (better performance, needs calibration)
model.qconfig = torch.quantization.get_default_qconfig('fbgemm')
model_prepared = torch.quantization.prepare(model)

# Calibration with representative data
for batch in calibration_dataloader:
    model_prepared(batch)

model_quantized = torch.quantization.convert(model_prepared)

# Compare sizes
original_size = sum(p.numel() * 4 for p in model.parameters())  # FP32
quantized_size = sum(p.numel() * 1 for p in model_quantized.parameters())  # INT8
print(f"Size reduction: {original_size / quantized_size:.2f}x")

Q: Что такое Knowledge Distillation?

A:

Knowledge Distillation — transfer knowledge от teacher (large) к student (small) модели.

Temperature-scaled softmax: $$p_i = \frac{\exp(z_i/T)}{\sum_j \exp(z_j/T)}$$

Distillation loss: $$\mathcal{L} = \alpha \cdot \mathcal{L}_{CE}(y_{true}, y_{student}) + (1-\alpha) \cdot \mathcal{L}_{KL}(y_{teacher}^{\tau}, y_{student}^{\tau})$$

где $\tau$ — temperature (обычно 3-5), $\alpha$ — weight (обычно 0.1-0.5).

class DistillationLoss(nn.Module):
    def __init__(self, temperature=4.0, alpha=0.5):
        super().__init__()
        self.temperature = temperature
        self.alpha = alpha
        self.ce_loss = nn.CrossEntropyLoss()
        self.kl_loss = nn.KLDivLoss(reduction='batchmean')

    def forward(self, student_logits, teacher_logits, labels):
        # Hard target loss
        loss_ce = self.ce_loss(student_logits, labels)

        # Soft target loss (distillation)
        student_soft = F.log_softmax(student_logits / self.temperature, dim=-1)
        teacher_soft = F.softmax(teacher_logits / self.temperature, dim=-1)
        loss_kl = self.kl_loss(student_soft, teacher_soft) * (self.temperature ** 2)

        return self.alpha * loss_ce + (1 - self.alpha) * loss_kl

Q: Pruning — как и когда применять?

A:

Pruning types: - Unstructured: Remove individual weights (harder to accelerate) - Structured: Remove entire channels/heads (better speedup)

Magnitude pruning:
def prune_model(model, sparsity=0.5):
    for name, param in model.named_parameters():
        if 'weight' in name:
            threshold = torch.quantile(torch.abs(param.data), sparsity)
            mask = torch.abs(param.data) > threshold
            param.data *= mask.float()
    return model
Iterative pruning (better results): 1. Train to convergence 2. Prune 20% of weights 3. Fine-tune for N epochs 4. Repeat until target sparsity

Когда pruning эффективен: - Overparameterized models (ResNet-50 on small dataset) - Edge deployment с strict latency requirements - Models с redundant features

Killer¶

Q: Спроектируйте compression pipeline для 1B модели на edge device.

A:

Target: 1B param model → <500MB, <50ms latency on mobile

Pipeline:

Original (1B, FP32) = 4GB
  ↓ Structured Pruning (50% sparsity)
Pruned (500M effective, FP32) = 2GB
  ↓ Knowledge Distillation (300M student)
Distilled (300M, FP32) = 1.2GB
  ↓ QAT (INT8)
Final (300M, INT8) = 300MB ✅

Implementation:

class CompressionPipeline:
    def __init__(self, teacher_model, target_size_mb=300):
        self.teacher = teacher_model
        self.target_size = target_size_mb

    def compress(self, train_loader, val_loader):
        # Step 1: Structured Pruning
        print("Step 1: Pruning...")
        model = self.iterative_prune(
            self.teacher,
            train_loader,
            target_sparsity=0.5
        )

        # Step 2: Knowledge Distillation
        print("Step 2: Distilling...")
        student = self.create_student(model, ratio=0.6)  # 60% of pruned
        student = self.distill(model, student, train_loader, epochs=10)

        # Step 3: Quantization-Aware Training
        print("Step 3: Quantizing...")
        student = self.quantize_aware_training(student, train_loader)

        # Verify
        size_mb = self.get_model_size_mb(student)
        latency = self.measure_latency(student)
        accuracy = self.evaluate(student, val_loader)

        print(f"Final: {size_mb:.1f}MB, {latency:.1f}ms, {accuracy:.2f}%")
        return student

    def get_model_size_mb(self, model):
        # For INT8 quantized model
        param_size = sum(p.numel() * 1 for p in model.parameters())  # 1 byte
        return param_size / (1024 ** 2)

Trade-offs: | Constraint | Solution | Quality Impact | |------------|----------|----------------| | Size < 300MB | INT8 + Pruning | -2-3% accuracy | | Latency < 50ms | Structured pruning | -1-2% accuracy | | Quick deployment | PTQ only | -3-5% accuracy | | Best quality | QAT + Distill | -1-2% accuracy |

Q: Как выбрать между Quantization и Pruning?

A:

Criterion Quantization Pruning

Ease of implementation Easy (PTQ) Harder

Hardware support Universal Requires sparse ops

Speedup guarantee Yes (2-4x) Depends on sparsity

Accuracy preservation Good (QAT) Variable

Best for General deployment Overparameterized models

Practical recommendation: 1. Start with PTQ: Fast, minimal effort, good results 2. If insufficient: Add QAT 3. If still insufficient: Consider distillation 4. Pruning last resort: Complex, but best for extreme constraints

Combined approach (2025-2026):
Pruning (structured) → QAT → Distillation → PTQ fine-tune
= 10-15x size reduction with <5% accuracy loss

Заблуждение: canary deployment = A/B тестирование моделей

Canary -- постепенное переключение трафика (1% -> 5% -> 25% -> 100%) с auto-rollback при деградации. A/B -- статистический эксперимент с двумя группами для измерения эффекта. Canary = safety mechanism, A/B = measurement mechanism. Часто используются вместе: canary для deployment safety, A/B для бизнес-решения.

Заблуждение: PTQ (Post-Training Quantization) всегда теряет 5%+ accuracy

INT8 PTQ теряет 1-3% accuracy для большинства моделей. QAT (Quantization-Aware Training) -- ~1% loss. Для inference на edge: INT8 = 4x speedup при минимальных потерях. Knowledge Distillation + QAT дают 10-15x size reduction при <5% accuracy loss.

Заблуждение: model registry нужен только для больших команд

Даже solo DS нуждается в: version tracking (какая модель в production?), rollback (вернуть предыдущую версию), lineage (какие данные/код породили модель). MLflow Model Registry бесплатный и ставится за 10 минут.

Интервью: формат ответов¶

Model Serving¶

Красный флаг: "Для serving модели достаточно Flask + pickle"

Сильный ответ: "FastAPI (async, auto-docs, Pydantic validation), lifespan для model loading, health/readiness endpoints. A/B routing через deterministic hashing user_id. Prometheus metrics (latency p50/p95/p99, RPS, error rate). Docker + HPA для auto-scaling. Circuit breaker для graceful degradation."

Monitoring & Drift¶

Красный флаг: "Мониторить accuracy на production достаточно"

Сильный ответ: "4 типа drift: data (P(X) меняется, PSI/KS-test), concept (P(Y|X), performance drop), label (P(Y), сезонность), upstream (schema changes). Мониторинг: Evidently AI для drift reports, proxy metrics (CTR) когда ground truth задержан (fraud -- 30 дней). Alerting tiers: P0 (модель не отвечает), P1 (accuracy -5%), P2 (feature drift)."

CI/CD для ML¶

Красный флаг: "ML CI/CD -- это pytest + git push"

Сильный ответ: "5 уровней тестов: unit (preprocessing), data validation (Great Expectations), model tests (accuracy > threshold, latency < 50ms), behavioral tests (invariance, directional), comparison vs baseline. GitHub Actions + self-hosted GPU runners. Deployment: shadow mode -> canary (1%->5%->25%->100%) с auto-rollback."

Feature	GitHub Actions	GitLab CI	Jenkins
Setup	Easiest	Easy	Complex
GPU runners	Via self-hosted	Via self-hosted	Yes
Matrix builds	✅	✅	Limited
Cost	Free tier + minutes	Free tier	Free
ML ecosystem	Best (actions)	Good	Plugins

Technique	Speedup	Size Reduction	Quality Loss	Best For
Quantization	2-4x	2-4x	Low (1-3%)	Most production models
Pruning	2-5x	3-10x	Medium (2-5%)	Overparameterized models
Distillation	2-10x	2-50x	Low if well-tuned	Teacher-student scenarios
Low-rank	1.5-2x	2-4x	Low	Linear layers

Criterion	Quantization	Pruning
Ease of implementation	Easy (PTQ)	Harder
Hardware support	Universal	Requires sparse ops
Speedup guarantee	Yes (2-4x)	Depends on sparsity
Accuracy preservation	Good (QAT)	Variable
Best for	General deployment	Overparameterized models

MLOps: подготовка к интервью¶

1. Model Serving¶

Basic¶

Medium¶

Killer¶

CI/CD и Deployment¶

Basic¶

Medium¶

Monitoring & Observability¶

Basic¶

Medium¶

Killer¶

Experiment Tracking¶

Medium¶

Data Quality & Validation¶

Basic¶

Medium¶

Killer¶

Experiment Tracking & Model Registry¶

Basic¶

Medium¶

Killer¶

6. CI/CD Deep Dive for ML¶

Basic¶

Medium¶

Killer¶

7. Model Registry Deep Dive¶

Basic¶

Medium¶

Killer¶

8. Model Compression for Production¶

Basic¶

Medium¶

Killer¶

Интервью: формат ответов¶

Model Serving¶

Monitoring & Drift¶

CI/CD для ML¶

See Also¶