Масштабирование системы ранжирования ленты¶
~2 минуты чтения
Предварительно: Компоненты, Метрики
Facebook ранжирует ленту для 2B DAU, выбирая ~50 постов из 10,000+ кандидатов за < 200ms. При 500K QPS это означает 500K x 10,000 = 5B candidate evaluations в секунду. Прямой scoring всех кандидатов тяжёлой моделью невозможен -- нужен multi-stage funnel: candidate generation (10K -> 1K) за 20ms, light ranker (1K -> 100) за 50ms, heavy ranker (100 -> 50) за 80ms.
Traffic Patterns¶
Volume¶
| Метрика | Значение |
|---|---|
| DAU | 2B+ |
| Feed requests per second | 500K+ |
| Candidates per request | 10,000+ |
| Final feed size | ~50 posts |
| Feature vector | 500+ features per candidate |
| Models in ensemble | 10+ (per-action predictors) |
Latency Budget¶
Total feed generation: 200ms (p99)
Candidate generation: 20ms
Friend posts retrieval: 10ms (graph DB)
Group/page posts: 5ms (inverted index)
Ads injection: 5ms (ad server)
Light ranker (10K -> 500): 50ms
Feature assembly: 20ms
Lightweight model: 30ms (logistic regression)
Heavy ranker (500 -> 50): 80ms
Full feature assembly: 30ms
Deep model ensemble: 40ms (multi-tower NN)
Re-ranking rules: 10ms (diversity, freshness)
Post-processing: 30ms
Ad insertion: 10ms
Story blending: 10ms
Final diversity filter: 10ms
Buffer: 20ms
Multi-Stage Funnel¶
Stage 1: Candidate Generation (10K -> 1K)¶
Sources:
Friends' posts: Graph DB lookup, O(friend_count)
Groups: Pre-computed index, top posts
Pages followed: Pre-computed index
Explore/suggested: ANN retrieval (FAISS/ScaNN)
Ads: From ad auction service
Merge strategy:
- Dedup by post_id
- Freshness filter (< 48h for most, < 7d for popular)
- Author diversity (max 3 posts per author)
Latency: 20ms, mostly parallel I/O
Stage 2: Light Ranker (1K -> 100-200)¶
Model: Logistic regression / small GBDT
Features: ~50 (cheap to compute)
- Author-viewer affinity (pre-computed)
- Post age, type, engagement count
- User's historical engagement with similar content
Throughput: 100K candidates/sec per pod
Latency: 50ms for 1K candidates
Purpose: Remove obviously irrelevant candidates cheaply
Stage 3: Heavy Ranker (200 -> 50)¶
Model: Multi-tower deep NN (per-action heads)
Features: 500+ (expensive features included)
- Content embeddings (pre-computed)
- User-post interaction features (real-time)
- Social context (friends who liked)
- Cross-features (user x content type)
Per-action predictions:
P(like), P(comment), P(share), P(click),
P(watch_30s), P(hide), P(report)
Combined score:
score = w1*P(like) + w2*P(comment) + w3*P(share) + ...
- w_neg*P(hide) - w_report*P(report)
Latency: 80ms for 200 candidates
Stage 4: Re-ranking & Blending¶
Rules (post heavy ranker):
- Diversity: max 2 consecutive posts from same type
- Freshness boost: recent posts get position bump
- Ad insertion: every 4-5 organic posts
- Story blending: Stories at top, Reels mixed in
- Anti-echo-chamber: inject 10-15% cross-viewpoint content
Horizontal Scaling¶
Service Architecture¶
| Component | Instances | Capacity |
|---|---|---|
| Feed orchestrator | 200 pods | 2.5K QPS each |
| Candidate generator | 100 pods | 5K QPS each |
| Light ranker | 300 pods | 100K candidates/sec each |
| Heavy ranker | 500 pods (GPU) | 20K candidates/sec each |
| Feature store (Redis) | 200 shards | 500GB, 10M ops/sec |
| Social graph (TAO) | 100 shards | 10M lookups/sec |
| User embedding store | 50 shards | 2B embeddings |
Feature Store Tiering¶
| Feature type | Storage | Freshness | Latency |
|---|---|---|---|
| User embeddings | Redis | 15 min (Flink) | 1ms |
| Post embeddings | Redis | On publish | 1ms |
| Social affinity | Redis | Hourly | 1ms |
| Post engagement counts | Redis | Real-time (Kafka Streams) | 0.5ms |
| User history (last 100) | Redis | Real-time | 2ms |
| Long-term aggregates | Memcached | Daily batch | 3ms |
Training Pipeline¶
Continuous Learning¶
Data pipeline (streaming):
User actions -> Kafka -> Flink -> Training data store
Training schedule:
Light ranker: Daily retrain (CPU, 1 hour)
Heavy ranker: Daily retrain (GPU, 4 hours)
Embeddings: Continuous update (Flink, 15 min windows)
Exploration: 10% traffic gets random diversification
Deployment:
Model validation -> Shadow mode (1%) -> Canary (5%) -> Full rollout
Rollback: automatic if NDCG drops > 1% or hide rate increases > 5%
High Availability¶
Degradation Strategy¶
| Failure | Fallback | Impact |
|---|---|---|
| Heavy ranker down | Light ranker only (top 50) | -15% engagement |
| Light ranker down | Chronological + popularity | -30% engagement |
| Feature store down | Cached features (15 min stale) | -5% engagement |
| Social graph down | Skip social features | -10% engagement |
| Everything down | Chronological feed | -40% engagement |
Caching¶
Multi-level cache:
L1: Pod-local LRU (top 10K users' feeds, 2min TTL)
L2: Redis (pre-computed feeds, 5min TTL)
L3: Full recomputation (on cache miss)
Cache hit rates:
L1: ~30% (power users refresh often)
L2: ~50% (most users)
L3: ~20% (cold start, cache expired)
Feed invalidation: on new post from close friend or viral post
Cost¶
| Component | Monthly cost |
|---|---|
| Heavy ranker (500 GPU pods) | $1.2M |
| Light ranker (300 CPU pods) | $200K |
| Feature store (Redis 200 shards) | $500K |
| Social graph (100 shards) | $400K |
| Training (daily, 100 GPUs) | $300K |
| Kafka + streaming | $150K |
| Total | ~$2.75M |
Заблуждение: можно ранжировать все 10K кандидатов тяжёлой моделью
Heavy model на 10K кандидатов = 10K x 0.4ms = 4 seconds. При budget 200ms нужен multi-stage funnel: light ranker (LR/GBDT, 0.05ms/candidate) сокращает 10K до 200, heavy ranker (deep NN, 0.4ms/candidate) работает только с 200. Без funnel нужно 20x больше GPU.
Заблуждение: хронологическая лента проще и лучше
Twitter в 2023 открыл алгоритм и дал опцию хронологической ленты. Результат: 80%+ пользователей вернулись к algorithmic feed через неделю. Причина: при 500+ следованиях хронологическая лента заполнена нерелевантным контентом. Algorithmic ranking повышает engagement на 40-60% vs chronological. Но: пользователю нужна transparency (Instagram показывает "why you're seeing this").
Секция для интервью¶
Вопрос: "Как генерировать ленту для 2B пользователей за 200ms?"
Слабый ответ: "Кэшировать ленту для каждого пользователя."
Сильный ответ: "Multi-stage funnel. (1) Candidate generation (20ms): 10K кандидатов из social graph + groups + explore. (2) Light ranker (50ms): LR/GBDT на 50 cheap features, отсекает 80% (10K -> 200). (3) Heavy ranker (80ms): multi-tower deep NN с per-action heads (P(like), P(comment), P(share), P(hide)), combined score с product-defined weights. (4) Re-ranking (30ms): diversity rules, ad insertion, freshness boost. Инфраструктура: 500 GPU pods для heavy ranker, Redis feature store (200 shards, 10M ops/sec) с tiered freshness (real-time user actions, 15-min embeddings, daily aggregates). Caching: multi-level (pod-local 30%, Redis 50%, full recompute 20%). Cost: $2.75M/мес, но engagement lift 40-60% vs chronological."