ML Production Patterns & War Stories (Layer 6)¶
~4 минуты чтения
Real-world lessons from Netflix, Uber, Airbnb, Google, Meta Обновлено: 2026-02-11
Netflix ML Lessons¶
Recommendation System at Scale¶
Architecture: - Multi-stage: Retrieval (ANN) → Ranking → Re-ranking - Real-time + Batch features - A/B testing infrastructure
War Story: Cold Start Problem
"We realized 20% of users are new every month. We built a separate onboarding model that uses only session data."
Key Lessons: 1. Personalization requires diversity (don't show same genre) 2. Freshness matters (users want new content) 3. Explainability improves trust ("Because you watched...")
Sources¶
- Netflix Tech Blog: Foundation Model for Personalized Recommendation
- URL: https://netflixtechblog.com/foundation-model-for-personalized-recommendation-1a0bd8e02d39
Uber ML Lessons¶
Real-Time Pricing (Surge)¶
Challenge: Predict demand in <100ms globally
Solution: - Pre-compute features in streaming (Kafka + Flink) - Model inference at edge (regional) - Fallback to rules if model fails
War Story: New Year's Eve
"Our model saw 100x traffic spike. Pre-scaling based on historical patterns saved us."
Key Lessons: 1. Capacity planning for predictable spikes 2. Graceful degradation essential 3. Real-time features need streaming infrastructure
Fraud Detection¶
Challenge: Detect fraud before transaction completes
Solution: - Feature freshness < 1 second - Model in Redis (fast lookup) - Multi-model ensemble (speed vs accuracy)
Sources¶
- Uber Engineering Blog
- Patrick Koss: Why Your ML Platform Will Fail at 3 AM
Airbnb ML Lessons¶
Search Ranking¶
Evolution: 1. Rules-based → GBDT → Neural Networks → LLM-enhanced
War Story: Position Bias
"Users click top results regardless of quality. We added position as feature and trained with IPS (Inverse Propensity Scoring)."
Key Lessons: 1. Position bias correction essential for ranking 2. Offline metrics don't always correlate with online 3. A/B test everything
Pricing Model¶
Challenge: Dynamic pricing for 6M+ listings
Solution: - Separate models per market - Seasonal features - Competitor pricing signals
Sources¶
- Airbnb Deep Learning Journey
- URL: https://zayunsna.github.io/ds/2025-05-02-airbnb_model/
Google ML Lessons¶
Hidden Technical Debt¶
From the famous paper:
"Only a small fraction of real-world ML systems are actual ML code. The rest is infrastructure."
Key Points: 1. Data dependencies are hidden 2. Configuration changes break models 3. Monitoring is an afterthought 4. Model updates affect other systems
Production Patterns¶
1. Canary Deployment
- 1% traffic → new model
- Monitor metrics
- Gradual rollout
2. Shadow Mode
- Run new model in parallel
- Compare predictions (no user impact)
- Validate before switching
3. Feature Stores
- Single source of truth
- Consistent train/serve
- Version control for features
Sources¶
- Google: Hidden Technical Debt in ML Systems
- URL: https://research.google/pubs/pub43146/
Meta (Facebook) ML Lessons¶
News Feed Ranking¶
Scale: Billions of predictions per second
Architecture: - Multi-stage ranking - Click prediction + engagement + long-term value - Real-time personalization
War Story: Filter Bubbles
"Users only saw content they agreed with. We added diversity constraints to ensure exposure to different viewpoints."
Key Lessons: 1. Optimization metrics affect user behavior 2. Long-term value > short-term engagement 3. Diversity controls needed
Common Production Failures¶
1. Data Drift Not Detected¶
Symptom: Model accuracy drops slowly
Cause: Feature distribution changed, no alerts
Fix: PSI monitoring on all features, automated retraining
2. Cold Start Cascade¶
Symptom: New model fails for subset of users
Cause: Missing features for new users
Fix: Fallback features, imputation, separate cold-start model
3. Latency Spike¶
Symptom: P99 > 500ms suddenly
Cause: Model size increased, no batching
Fix: Model quantization, dynamic batching, caching
4. Memory Leak¶
Symptom: OOM after hours/days
Cause: Accumulating predictions in memory
Fix: Batch processing, garbage collection, memory profiling
5. Feature Store Inconsistency¶
Symptom: Training-serving skew
Cause: Different feature computation in batch vs online
Fix: Single feature definition, centralized feature store
Production Checklist¶
Before Deployment¶
[ ] Model metrics meet threshold (accuracy, latency)
[ ] Feature dependencies documented
[ ] Fallback strategy defined
[ ] Monitoring dashboards ready
[ ] Rollback plan tested
[ ] A/B test framework configured
[ ] Capacity planning done
[ ] Security review passed
During Deployment¶
[ ] Shadow mode validated
[ ] Canary at 1% traffic
[ ] Metrics within bounds
[ ] Gradual rollout (1% → 10% → 50% → 100%)
[ ] Real-time alerts active
[ ] On-call engineer assigned
After Deployment¶
[ ] A/B test significance achieved
[ ] Business metrics improved
[ ] Model performance stable
[ ] Documentation updated
[ ] Retrospective scheduled
Monitoring Patterns¶
Key Metrics¶
Service Health:
- Request latency (P50, P99)
- Error rate
- Throughput (RPS)
Model Health:
- Prediction distribution
- Feature distribution (PSI)
- Model confidence
Business Impact:
- Conversion rate
- Revenue impact
- User engagement
Alerting Strategy¶
P0 (Wake on-call):
- Error rate > 1%
- Latency P99 > 1s
- Model returning errors
P1 (Next business day):
- PSI > 0.25
- Accuracy drop > 2%
- Feature freshness stale
P2 (Weekly review):
- Drift trend analysis
- Cost optimization
- Capacity trends
Cost Optimization Patterns¶
Compute¶
1. Spot Instances — 70% cheaper, handle preemption
2. Right-sizing — Match instance to workload
3. Auto-scaling — Scale down during low traffic
4. Batch inference — Group predictions, reduce overhead
Model¶
1. Quantization — INT8 = ~4x smaller vs FP32, ~2x faster
2. Distillation — Smaller model, similar accuracy
3. Pruning — Remove unnecessary weights
4. Caching — Cache frequent predictions
Anti-Patterns to Avoid¶
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Manual deployment | Error-prone, slow | CI/CD pipeline |
| No monitoring | Silent failures | Comprehensive dashboards |
| Single model | No fallback | Model ensemble / fallback |
| Hardcoded features | Inflexible | Feature store |
| Training-serving skew | Inconsistent predictions | Same pipeline |
| No versioning | Can't rollback | Model registry |
| Overfitting to metrics | Bad user experience | Multi-metric optimization |
Sources: Netflix Tech Blog, Uber Engineering, Airbnb Tech, Google Research, Meta Engineering