MLOps: подготовка к интервью¶
~4 минуты чтения
Предварительно: Учебные материалы MLOps | LLMOps vs MLOps
MLOps-инженер -- одна из самых востребованных позиций 2026: средняя зарплата $165K (Levels.fyi), 47% рост вакансий за год. На интервью проверяют: model serving (FastAPI + A/B), CI/CD для ML (GitHub Actions + MLflow), мониторинг (drift detection, PSI, Evidently AI), model registry, compression (quantization, pruning, distillation). 8 разделов ниже покрывают все ключевые темы с вопросами трёх уровней.
Обновлено: 2026-02-12
1. Model Serving¶
Basic¶
Q: Зачем FastAPI для ML?
A: Async-first, high performance, auto-generated docs (Swagger), Pydantic validation, WebSocket support.
Q: Что такое model serialization?
A: Сохранение модели в файл для loading в production. Форматы: pickle, joblib, ONNX, TorchScript.
Medium¶
Q: Как организовать A/B тестирование моделей?
A: (1) Deterministic routing по user_id, (2) Feature flag для traffic split, (3) Logging model_version, (4) Post-hoc analysis.
Q: Model versioning strategies?
A: (1) MLflow Model Registry, (2) Semantic versioning, (3) Git tags + artifact storage, (4) Container tags.
Killer¶
Q: Спроектируйте high-availability model serving.
A: (1) Multiple replicas, (2) Load balancer, (3) Health checks, (4) Circuit breaker, (5) Graceful degradation, (6) Model caching, (7) Async inference queue.
CI/CD и Deployment¶
Basic¶
Q: Что такое CI/CD для ML?
A: Continuous Integration (автоматические тесты кода и данных) + Continuous Delivery (автоматический deployment моделей). Отличие от software CI/CD: нужно тестировать не только код, но и данные (schema validation, distribution checks) и модели (performance regression, bias testing). Инструменты: GitHub Actions, GitLab CI, Jenkins + MLflow/DVC.
Q: Как версионировать ML модели?
A: (1) MLflow Model Registry — стандарт, stage transitions (staging/production), (2) DVC — version control для данных и моделей, (3) Git tags + artifact storage (S3/GCS), (4) Container image tags (model baked in). Best practice: семантическое версионирование + хэш данных + хэш кода.
Medium¶
Q: Как организовать blue-green deployment для ML?
A: Две идентичные среды: blue (current) и green (new). (1) Deploy новую модель на green, (2) Run smoke tests, (3) Switch traffic (DNS/LB), (4) Rollback если metrics degraded. Отличие от canary: весь трафик переключается сразу, проще rollback. Для ML дополнительно: проверить prediction distribution, latency, и feature pipeline consistency.
Q: Что такое shadow mode deployment?
A: Новая модель получает production traffic, но predictions не используются — только логируются для сравнения с текущей моделью. Преимущества: (1) реальный трафик, (2) нет risk для пользователей, (3) можно сравнить latency, accuracy, distribution. Минус: удвоенная compute cost. Используют перед canary/A/B для high-risk моделей.
Q: Как тестировать ML модели в CI?
A: (1) Unit tests — preprocessing functions, feature engineering, (2) Integration tests — end-to-end pipeline на sample data, (3) Model tests — performance > threshold on holdout, (4) Data tests — schema validation, null checks, distribution drift, (5) Smoke tests — inference на golden examples. Fail CI если любой regression > tolerance.
Monitoring & Observability¶
Basic¶
Q: Какие метрики мониторить для ML в production?
A: (1) System: latency (p50/p95/p99), throughput (RPS), error rate, CPU/RAM, (2) Model: prediction distribution, feature distribution, confidence scores, (3) Business: conversion, engagement, revenue impact. Data drift мониторинг: PSI, KS-test на feature distributions.
Medium¶
Q: Как обнаружить model degradation в production?
A: (1) Delayed ground truth — сравнить predictions с delayed labels (fraud через 30 дней), (2) Prediction drift — если P(y=1) сдвигается, модель деградирует, (3) Feature drift — PSI/KS-test на input features, (4) Proxy metrics — если online metric (CTR) падает при стабильном трафике. Инструменты: Evidently AI, WhyLabs, custom Grafana dashboards.
Q: Что такое data drift и concept drift?
A: Data drift — изменение P(X), распределение фичей сдвинулось (новая демография, сезонность). Concept drift — изменение P(Y|X), связь фичей с таргетом изменилась (пандемия изменила поведение). Data drift детектим через PSI/KS-test на фичах. Concept drift детектим через мониторинг model accuracy на свежих данных. Решение: retrain с новыми данными, адаптивные модели.
Killer¶
Q: Спроектируйте систему мониторинга для 50 ML моделей в production.
A: Architecture: (1) Centralized logging: все predictions -> Kafka -> ClickHouse, (2) Feature store monitoring: freshness, null rate, cardinality per feature, (3) Model performance: scheduled evaluation на delayed ground truth, (4) Alerting tiers: P0 (модель не отвечает), P1 (accuracy drop > 5%), P2 (feature drift detected), (5) Dashboard: Grafana с per-model panels, (6) Auto-retraining trigger: если drift score > threshold, запускаем retraining pipeline, (7) SLA: p99 latency < 100ms, availability > 99.9%.
Experiment Tracking¶
Medium¶
Q: MLflow vs Weights & Biases — когда что использовать?
A: MLflow: open-source, self-hosted, model registry, больше MLOps. W&B: SaaS, лучше визуализация, collaboration, sweep для HPO. Выбор: если self-hosted requirement -> MLflow. Если команда и collaboration -> W&B. Многие используют оба: W&B для экспериментов, MLflow для model registry и deployment.
Q: Как организовать reproducibility ML эксперимента?
A: (1) Зафиксировать код (git commit hash), (2) Зафиксировать данные (DVC, hash dataset), (3) Залогировать hyperparameters (MLflow/W&B), (4) Зафиксировать environment (Docker, requirements.txt с pinned versions), (5) Зафиксировать random seed, (6) Залогировать hardware (GPU type). Всё автоматически через MLflow
autolog()или W&Bwandb.init().
Data Quality & Validation¶
Basic¶
Q: Что такое data contracts?
A: Формальное соглашение между producer и consumer данных о schema, constraints, и SLA. Контракт определяет: (1) Schema — типы, nullable, uniqueness, (2) Constraints — диапазоны, patterns, referential integrity, (3) Freshness — как часто обновляется, (4) Quality SLA — допустимый % null, duplicates. Реализация: JSON schemas, Protobuf, Great Expectations suites.
Q: Зачем нужен Great Expectations?
A: Data validation framework для автоматической проверки качества данных. Позволяет: (1) Define expectations — правила валидации (expect_column_values_to_be_unique, expect_column_values_to_match_regex), (2) Generate docs — data docs с результатами проверок, (3) Integrate с pipelines — Airflow, Spark, Pandas, (4) Alerting — при нарушении expectations.
Medium¶
Q: Как построить data validation pipeline?
A: (1) Schema validation — типы, nullable, cardinality, (2) Statistical validation — distribution checks, null rate, outlier detection, (3) Business rules — domain-specific constraints (age > 0, email format), (4) Cross-dataset validation — referential integrity, consistency checks. Инструменты: Great Expectations для batch, Deequ для Spark, custom SQL tests. Валидация на: ingestion, preprocessing, pre-training.
Q: Medallion Architecture — что это?
A: Layered data architecture (Databricks/Lakehouse): - Bronze: Raw data as-is, append-only, full history - Silver: Cleaned, deduplicated, validated, enriched - Gold: Aggregated, business-ready, ML-optimized
Валидация происходит на Bronze→Silver transition. Great Expectations проверяет schema, quality, completeness.
Killer¶
Q: Как обработать data quality incident в production?
A: (1) Detect — automated alerts на quality metrics degradation, (2) Triage — определить upstream source, (3) Quarantine — изолировать bad data, не ломать pipeline, (4) Notify — alert data consumers, (5) Root cause — bug в producer, schema change, upstream outage, (6) Fix — patch producer или update contract, (7) Backfill — пересчитать affected downstream, (8) Post-mortem — update validation rules. Prevention: data contracts, schema registry, integration tests.
Experiment Tracking & Model Registry¶
Basic¶
Q: Что логировать в ML эксперименте?
A: - Code: git commit, branch, diff - Data: dataset version, hash, split sizes - Hyperparameters: learning_rate, batch_size, architecture - Metrics: loss, accuracy, per-epoch, per-step - Artifacts: model weights, configs, sample predictions - Environment: Python version, GPU type, library versions - Random seed: для reproducibility
Medium¶
Q: MLflow vs W&B vs Neptune vs DVC — comparison?
| Feature | MLflow | W&B | Neptune | DVC |
|---|---|---|---|---|
| Type | Self-hosted/SaaS | SaaS | SaaS | Open source |
| Model Registry | ✅ Best | Basic | Basic | Via Git |
| Collaboration | Limited | ✅ Strong | Good | Git-based |
| Visualization | Good | ✅ Best | Good | Limited |
| Data Versioning | No | No | No | ✅ Core |
| Cost | Free/Enterprise | Free tier | Free tier | Free |
Use cases: - Solo/Small team: MLflow (free, self-hosted) - Team collaboration: W&B (best UX, collaboration) - Data-heavy: MLflow + DVC (model + data versioning) - Enterprise: Neptune/MLflow Enterprise (compliance, scale)
Q: Что такое model registry и зачем нужен?
A: Central repository для versioned models с metadata. Функции: - Versioning: Каждая model version с уникальным ID - Staging: Stages — None → Staging → Production → Archived - Metadata: Metrics, parameters, lineage, tags - Access control: Кто может promote модели - Lineage: Какой experiment, data, code породил модель
MLflow Model Registry workflow:
# Register model
mlflow.log_model(model, "model", registered_model_name="fraud_detector")
# Transition stage
client = mlflow.tracking.MlflowClient()
client.transition_model_version_stage(
name="fraud_detector",
version=3,
stage="Production"
)
Killer¶
Q: Спроектируйте ML experiment tracking для команды из 20 DS.
A: Requirements: 20 DS, 100+ experiments/day, 10+ models in production, compliance.
Architecture:
1. Centralized MLflow: Self-hosted on Kubernetes, PostgreSQL backend, S3 artifact store
2. Experiment naming convention: {team}/{project}/{experiment_name}
3. Auto-logging: Wrapper around training code для automatic metric/param logging
4. Model registry: All production models через registry, staged deployment
5. CI integration: Experiments logged from CI runs, сравнение с baseline
6. Data versioning: DVC для datasets, linked в MLflow params
7. Dashboards: Grafana dashboards для model performance, experiment counts
8. Alerting: Failed experiments, metric regression vs baseline
9. Cost management: Artifact TTL, auto-cleanup old experiments
Governance: - Model promotion requires 2 reviews - All experiments tagged with business_unit, cost_center - Monthly experiment audit
6. CI/CD Deep Dive for ML¶
Basic¶
Q: Что должно быть в ML CI pipeline?
A: - Code linting: ruff, black, mypy - Unit tests: preprocessing, feature engineering functions - Data validation: schema checks, null rate, distribution - Model training on sample: fast training run для sanity check - Model evaluation: metrics vs baseline - Artifact upload: model to registry, data to DVC
Q: GitHub Actions vs GitLab CI vs Jenkins для ML?
A:
Feature GitHub Actions GitLab CI Jenkins Setup Easiest Easy Complex GPU runners Via self-hosted Via self-hosted Yes Matrix builds ✅ ✅ Limited Cost Free tier + minutes Free tier Free ML ecosystem Best (actions) Good Plugins Recommendation: GitHub Actions для ML teams (best actions ecosystem, easy self-hosted runners на GPU).
Medium¶
Q: Как организовать automated model testing?
A: 3 levels:
Level 1: Behavioral Tests
def test_model_invariance(): # Model should give same prediction for semantically equivalent inputs text1 = "Customer wants refund" text2 = "The customer is requesting a refund" assert model.predict(text1) == model.predict(text2) def test_model_directional(): # Increasing income should not decrease credit_score prediction low_income = {"income": 30000} high_income = {"income": 100000} assert model.predict(high_income) >= model.predict(low_income)Level 2: Performance Tests
def test_accuracy_threshold(): metrics = evaluate_model(model, test_data) assert metrics["accuracy"] > 0.85, f"Accuracy {metrics['accuracy']} below threshold" assert metrics["f1"] > 0.80Level 3: Comparison Tests
Q: Что такое canary deployment для ML?
A: Постепенное переключение трафика на новую модель: - 1% traffic → monitor metrics 1 hour - 5% traffic → monitor metrics 4 hours - 25% traffic → monitor metrics 24 hours - 100% traffic → full rollout
Rollback criteria: - Error rate > baseline + 0.1% - Latency p99 > baseline * 1.5 - Business metric (CTR, conversion) drop > 2%
Implementation: Feature flag (LaunchDarkly), Istio traffic split, Kubernetes Service weights.
Killer¶
Q: Спроектируйте ML CI/CD pipeline для team of 10.
A:
# .github/workflows/ml-pipeline.yml name: ML Pipeline on: push: branches: [main] pull_request: jobs: lint: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - run: ruff check . - run: mypy . - run: black --check . test: needs: lint runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - run: pytest tests/unit -m "not gpu" - run: pytest tests/integration --sample-data - run: great_expectations checkpoint run ml_suite train: needs: test runs-on: [self-hosted, gpu] steps: - uses: actions/checkout@v4 - run: dvc pull data - run: python train.py --config configs/experiment.yaml - run: mlflow models log-model evaluate: needs: train runs-on: ubuntu-latest steps: - run: pytest tests/model_tests - run: python evaluate.py --compare-baseline deploy-staging: needs: evaluate if: github.ref == 'refs/heads/main' environment: staging runs-on: ubuntu-latest steps: - run: echo "Deploy to staging + shadow mode tests" deploy-production: needs: deploy-staging environment: name: production runs-on: ubuntu-latest steps: - run: echo "Canary: 1% -> 5% -> 25% -> 100%" # Auto-rollback on metric degradation
7. Model Registry Deep Dive¶
Basic¶
Q: Зачем нужен model lineage?
A: Model lineage отвечает на вопросы: - Какие данные использованы для обучения? - Какой код/commit породил модель? - Какие hyperparameters? - Кто и когда обучал? - Какая была baseline метрика?
Implementation: MLflow автоматически links run_id -> model_version -> artifacts -> params/metrics.
Q: Как организовать model rollback?
A:
# Option 1: Promote previous version client.transition_model_version_stage( name="fraud_detector", version=previous_version, stage="Production", archive_existing_versions=True ) # Option 2: Load specific version model = mlflow.pyfunc.load_model(f"models:/fraud_detector/production") # или models:/fraud_detector/versions/3
Medium¶
Q: Как организовать A/B тестирование через model registry?
A:
# Router based on user segment def predict(user_id, features): if is_in_experiment(user_id, "model_v2"): model = mlflow.pyfunc.load_model("models:/fraud_detector/versions/5") else: model = mlflow.pyfunc.load_model("models:/fraud_detector/versions/4") prediction = model.predict(features) log_prediction(user_id, prediction, model_version) return predictionRegistry integration: - Version 4 = "Production" (control) - Version 5 = "Staging" (treatment) - Traffic split via feature flag
Killer¶
Q: Спроектируйте model governance для regulated industry (finance/healthcare).
A:
Governance requirements: 1. Audit trail: Кто, когда, почему promoted модель 2. Approval workflow: Min 2 approvals для production 3. Documentation: Model card, data lineage, fairness report 4. Retention: Keep all versions + training data for 7 years 5. Explainability: SHAP values для all predictions
Implementation:
# Model promotion workflow def promote_model(model_name, version, approvers): # 1. Check all requirements assert len(approvers) >= 2, "Need 2 approvals" assert model_has_documentation(model_name, version) assert fairness_check_passed(model_name, version) assert explainability_enabled(model_name, version) # 2. Log audit event audit_log.record({ "action": "promote", "model": model_name, "version": version, "approvers": approvers, "timestamp": now(), "justification": get_justification(), }) # 3. Transition client.transition_model_version_stage( name=model_name, version=version, stage="Production" )
8. Model Compression for Production¶
Basic¶
Q: Какие техники model compression используются в production?
A:
Technique Speedup Size Reduction Quality Loss Best For Quantization 2-4x 2-4x Low (1-3%) Most production models Pruning 2-5x 3-10x Medium (2-5%) Overparameterized models Distillation 2-10x 2-50x Low if well-tuned Teacher-student scenarios Low-rank 1.5-2x 2-4x Low Linear layers Типичный pipeline: Pruning → QAT (Quantization-Aware Training) → Fine-tune
Q: Что такое Quantization и когда использовать?
A:
Quantization — уменьшение precision весов: FP32 → FP16/INT8/INT4.
Типы: - Post-Training Quantization (PTQ): Fast, no retraining, ~2-3% accuracy loss - Quantization-Aware Training (QAT): Slower, better accuracy, ~1% loss
FP16 vs INT8 vs INT4: | Precision | Memory | Speedup | Quality | Use Case | |-----------|--------|---------|---------|----------| | FP32 | 4 bytes | 1x | 100% | Training | | FP16 | 2 bytes | 2x | 99.5% | Training/Inference | | INT8 | 1 byte | 4x | 97-99% | Production inference | | INT4 | 0.5 byte | 8x | 90-95% | Edge/Extreme constraints |
Когда использовать: - Edge deployment → INT8/INT4 - Cloud inference → FP16/INT8 - Training → FP32/FP16 (mixed precision)
Medium¶
Q: Как реализовать PTQ для PyTorch модели?
A:
import torch from torch.quantization import quantize_dynamic # Dynamic quantization (easiest, works for most models) model_quantized = quantize_dynamic( model, {torch.nn.Linear, torch.nn.LSTM, torch.nn.GRU}, dtype=torch.qint8 ) # Static quantization (better performance, needs calibration) model.qconfig = torch.quantization.get_default_qconfig('fbgemm') model_prepared = torch.quantization.prepare(model) # Calibration with representative data for batch in calibration_dataloader: model_prepared(batch) model_quantized = torch.quantization.convert(model_prepared) # Compare sizes original_size = sum(p.numel() * 4 for p in model.parameters()) # FP32 quantized_size = sum(p.numel() * 1 for p in model_quantized.parameters()) # INT8 print(f"Size reduction: {original_size / quantized_size:.2f}x")
Q: Что такое Knowledge Distillation?
A:
Knowledge Distillation — transfer knowledge от teacher (large) к student (small) модели.
Temperature-scaled softmax: $\(p_i = \frac{\exp(z_i/T)}{\sum_j \exp(z_j/T)}\)$
Distillation loss: $\(\mathcal{L} = \alpha \cdot \mathcal{L}_{CE}(y_{true}, y_{student}) + (1-\alpha) \cdot \mathcal{L}_{KL}(y_{teacher}^{\tau}, y_{student}^{\tau})\)$
где \(\tau\) — temperature (обычно 3-5), \(\alpha\) — weight (обычно 0.1-0.5).
class DistillationLoss(nn.Module): def __init__(self, temperature=4.0, alpha=0.5): super().__init__() self.temperature = temperature self.alpha = alpha self.ce_loss = nn.CrossEntropyLoss() self.kl_loss = nn.KLDivLoss(reduction='batchmean') def forward(self, student_logits, teacher_logits, labels): # Hard target loss loss_ce = self.ce_loss(student_logits, labels) # Soft target loss (distillation) student_soft = F.log_softmax(student_logits / self.temperature, dim=-1) teacher_soft = F.softmax(teacher_logits / self.temperature, dim=-1) loss_kl = self.kl_loss(student_soft, teacher_soft) * (self.temperature ** 2) return self.alpha * loss_ce + (1 - self.alpha) * loss_kl
Q: Pruning — как и когда применять?
A:
Pruning types: - Unstructured: Remove individual weights (harder to accelerate) - Structured: Remove entire channels/heads (better speedup)
Magnitude pruning:
def prune_model(model, sparsity=0.5): for name, param in model.named_parameters(): if 'weight' in name: threshold = torch.quantile(torch.abs(param.data), sparsity) mask = torch.abs(param.data) > threshold param.data *= mask.float() return modelIterative pruning (better results): 1. Train to convergence 2. Prune 20% of weights 3. Fine-tune for N epochs 4. Repeat until target sparsity
Когда pruning эффективен: - Overparameterized models (ResNet-50 on small dataset) - Edge deployment с strict latency requirements - Models с redundant features
Killer¶
Q: Спроектируйте compression pipeline для 1B модели на edge device.
A:
Target: 1B param model → <500MB, <50ms latency on mobile
Pipeline:
Original (1B, FP32) = 4GB ↓ Structured Pruning (50% sparsity) Pruned (500M effective, FP32) = 2GB ↓ Knowledge Distillation (300M student) Distilled (300M, FP32) = 1.2GB ↓ QAT (INT8) Final (300M, INT8) = 300MB ✅Implementation:
class CompressionPipeline: def __init__(self, teacher_model, target_size_mb=300): self.teacher = teacher_model self.target_size = target_size_mb def compress(self, train_loader, val_loader): # Step 1: Structured Pruning print("Step 1: Pruning...") model = self.iterative_prune( self.teacher, train_loader, target_sparsity=0.5 ) # Step 2: Knowledge Distillation print("Step 2: Distilling...") student = self.create_student(model, ratio=0.6) # 60% of pruned student = self.distill(model, student, train_loader, epochs=10) # Step 3: Quantization-Aware Training print("Step 3: Quantizing...") student = self.quantize_aware_training(student, train_loader) # Verify size_mb = self.get_model_size_mb(student) latency = self.measure_latency(student) accuracy = self.evaluate(student, val_loader) print(f"Final: {size_mb:.1f}MB, {latency:.1f}ms, {accuracy:.2f}%") return student def get_model_size_mb(self, model): # For INT8 quantized model param_size = sum(p.numel() * 1 for p in model.parameters()) # 1 byte return param_size / (1024 ** 2)Trade-offs: | Constraint | Solution | Quality Impact | |------------|----------|----------------| | Size < 300MB | INT8 + Pruning | -2-3% accuracy | | Latency < 50ms | Structured pruning | -1-2% accuracy | | Quick deployment | PTQ only | -3-5% accuracy | | Best quality | QAT + Distill | -1-2% accuracy |
Q: Как выбрать между Quantization и Pruning?
A:
Criterion Quantization Pruning Ease of implementation Easy (PTQ) Harder Hardware support Universal Requires sparse ops Speedup guarantee Yes (2-4x) Depends on sparsity Accuracy preservation Good (QAT) Variable Best for General deployment Overparameterized models Practical recommendation: 1. Start with PTQ: Fast, minimal effort, good results 2. If insufficient: Add QAT 3. If still insufficient: Consider distillation 4. Pruning last resort: Complex, but best for extreme constraints
Combined approach (2025-2026):
Заблуждение: canary deployment = A/B тестирование моделей
Canary -- постепенное переключение трафика (1% -> 5% -> 25% -> 100%) с auto-rollback при деградации. A/B -- статистический эксперимент с двумя группами для измерения эффекта. Canary = safety mechanism, A/B = measurement mechanism. Часто используются вместе: canary для deployment safety, A/B для бизнес-решения.
Заблуждение: PTQ (Post-Training Quantization) всегда теряет 5%+ accuracy
INT8 PTQ теряет 1-3% accuracy для большинства моделей. QAT (Quantization-Aware Training) -- ~1% loss. Для inference на edge: INT8 = 4x speedup при минимальных потерях. Knowledge Distillation + QAT дают 10-15x size reduction при <5% accuracy loss.
Заблуждение: model registry нужен только для больших команд
Даже solo DS нуждается в: version tracking (какая модель в production?), rollback (вернуть предыдущую версию), lineage (какие данные/код породили модель). MLflow Model Registry бесплатный и ставится за 10 минут.
Интервью: формат ответов¶
Model Serving¶
Красный флаг: "Для serving модели достаточно Flask + pickle"
Сильный ответ: "FastAPI (async, auto-docs, Pydantic validation), lifespan для model loading, health/readiness endpoints. A/B routing через deterministic hashing user_id. Prometheus metrics (latency p50/p95/p99, RPS, error rate). Docker + HPA для auto-scaling. Circuit breaker для graceful degradation."
Monitoring & Drift¶
Красный флаг: "Мониторить accuracy на production достаточно"
Сильный ответ: "4 типа drift: data (P(X) меняется, PSI/KS-test), concept (P(Y|X), performance drop), label (P(Y), сезонность), upstream (schema changes). Мониторинг: Evidently AI для drift reports, proxy metrics (CTR) когда ground truth задержан (fraud -- 30 дней). Alerting tiers: P0 (модель не отвечает), P1 (accuracy -5%), P2 (feature drift)."
CI/CD для ML¶
Красный флаг: "ML CI/CD -- это pytest + git push"
Сильный ответ: "5 уровней тестов: unit (preprocessing), data validation (Great Expectations), model tests (accuracy > threshold, latency < 50ms), behavioral tests (invariance, directional), comparison vs baseline. GitHub Actions + self-hosted GPU runners. Deployment: shadow mode -> canary (1%->5%->25%->100%) с auto-rollback."
See Also¶
- Data Engineering Interview Q&A -- data leakage, Spark, feature stores
- AI Agents Interview Q&A -- ReAct, multi-agent, memory systems
- MLOps Materials -- ресурсы для подготовки по MLOps
- Experiment Tracking Comparison -- MLflow vs W&B vs Neptune
- Feature Stores Comparison -- Feast vs Hopsworks vs Redis
- LLMOps vs MLOps -- ключевые различия для интервью