Масштабирование системы ранжирования ленты¶

~2 минуты чтения

Предварительно: Компоненты, Метрики

Facebook ранжирует ленту для 2B DAU, выбирая ~50 постов из 10,000+ кандидатов за < 200ms. При 500K QPS это означает 500K x 10,000 = 5B candidate evaluations в секунду. Прямой scoring всех кандидатов тяжёлой моделью невозможен -- нужен multi-stage funnel: candidate generation (10K -> 1K) за 20ms, light ranker (1K -> 100) за 50ms, heavy ranker (100 -> 50) за 80ms.

Traffic Patterns¶

Volume¶

Метрика	Значение
DAU	2B+
Feed requests per second	500K+
Candidates per request	10,000+
Final feed size	~50 posts
Feature vector	500+ features per candidate
Models in ensemble	10+ (per-action predictors)

Latency Budget¶

Total feed generation: 200ms (p99)

Candidate generation:       20ms
  Friend posts retrieval:   10ms  (graph DB)
  Group/page posts:          5ms  (inverted index)
  Ads injection:             5ms  (ad server)

Light ranker (10K -> 500):  50ms
  Feature assembly:         20ms
  Lightweight model:        30ms  (logistic regression)

Heavy ranker (500 -> 50):   80ms
  Full feature assembly:    30ms
  Deep model ensemble:      40ms  (multi-tower NN)
  Re-ranking rules:         10ms  (diversity, freshness)

Post-processing:            30ms
  Ad insertion:             10ms
  Story blending:           10ms
  Final diversity filter:   10ms

Buffer:                     20ms

Multi-Stage Funnel¶

Stage 1: Candidate Generation (10K -> 1K)¶

Sources:
  Friends' posts:     Graph DB lookup, O(friend_count)
  Groups:             Pre-computed index, top posts
  Pages followed:     Pre-computed index
  Explore/suggested:  ANN retrieval (FAISS/ScaNN)
  Ads:                From ad auction service

Merge strategy:
  - Dedup by post_id
  - Freshness filter (< 48h for most, < 7d for popular)
  - Author diversity (max 3 posts per author)

Latency: 20ms, mostly parallel I/O

Stage 2: Light Ranker (1K -> 100-200)¶

Model: Logistic regression / small GBDT
Features: ~50 (cheap to compute)
  - Author-viewer affinity (pre-computed)
  - Post age, type, engagement count
  - User's historical engagement with similar content

Throughput: 100K candidates/sec per pod
Latency: 50ms for 1K candidates

Purpose: Remove obviously irrelevant candidates cheaply

Stage 3: Heavy Ranker (200 -> 50)¶

Model: Multi-tower deep NN (per-action heads)
Features: 500+ (expensive features included)
  - Content embeddings (pre-computed)
  - User-post interaction features (real-time)
  - Social context (friends who liked)
  - Cross-features (user x content type)

Per-action predictions:
  P(like), P(comment), P(share), P(click),
  P(watch_30s), P(hide), P(report)

Combined score:
  score = w1*P(like) + w2*P(comment) + w3*P(share) + ...
        - w_neg*P(hide) - w_report*P(report)

Latency: 80ms for 200 candidates

Stage 4: Re-ranking & Blending¶

Rules (post heavy ranker):
  - Diversity: max 2 consecutive posts from same type
  - Freshness boost: recent posts get position bump
  - Ad insertion: every 4-5 organic posts
  - Story blending: Stories at top, Reels mixed in
  - Anti-echo-chamber: inject 10-15% cross-viewpoint content

Horizontal Scaling¶

Service Architecture¶

Component	Instances	Capacity
Feed orchestrator	200 pods	2.5K QPS each
Candidate generator	100 pods	5K QPS each
Light ranker	300 pods	100K candidates/sec each
Heavy ranker	500 pods (GPU)	20K candidates/sec each
Feature store (Redis)	200 shards	500GB, 10M ops/sec
Social graph (TAO)	100 shards	10M lookups/sec
User embedding store	50 shards	2B embeddings

Feature Store Tiering¶

Feature type	Storage	Freshness	Latency
User embeddings	Redis	15 min (Flink)	1ms
Post embeddings	Redis	On publish	1ms
Social affinity	Redis	Hourly	1ms
Post engagement counts	Redis	Real-time (Kafka Streams)	0.5ms
User history (last 100)	Redis	Real-time	2ms
Long-term aggregates	Memcached	Daily batch	3ms

Training Pipeline¶

Continuous Learning¶

Data pipeline (streaming):
  User actions -> Kafka -> Flink -> Training data store

Training schedule:
  Light ranker:  Daily retrain (CPU, 1 hour)
  Heavy ranker:  Daily retrain (GPU, 4 hours)
  Embeddings:    Continuous update (Flink, 15 min windows)
  Exploration:   10% traffic gets random diversification

Deployment:
  Model validation -> Shadow mode (1%) -> Canary (5%) -> Full rollout
  Rollback: automatic if NDCG drops > 1% or hide rate increases > 5%

High Availability¶

Degradation Strategy¶

Failure	Fallback	Impact
Heavy ranker down	Light ranker only (top 50)	-15% engagement
Light ranker down	Chronological + popularity	-30% engagement
Feature store down	Cached features (15 min stale)	-5% engagement
Social graph down	Skip social features	-10% engagement
Everything down	Chronological feed	-40% engagement

Caching¶

Multi-level cache:
  L1: Pod-local LRU (top 10K users' feeds, 2min TTL)
  L2: Redis (pre-computed feeds, 5min TTL)
  L3: Full recomputation (on cache miss)

Cache hit rates:
  L1: ~30% (power users refresh often)
  L2: ~50% (most users)
  L3: ~20% (cold start, cache expired)

Feed invalidation: on new post from close friend or viral post

Cost¶

Component	Monthly cost
Heavy ranker (500 GPU pods)	$1.2M
Light ranker (300 CPU pods)	$200K
Feature store (Redis 200 shards)	$500K
Social graph (100 shards)	$400K
Training (daily, 100 GPUs)	$300K
Kafka + streaming	$150K
Total	~$2.75M

Заблуждение: можно ранжировать все 10K кандидатов тяжёлой моделью

Heavy model на 10K кандидатов = 10K x 0.4ms = 4 seconds. При budget 200ms нужен multi-stage funnel: light ranker (LR/GBDT, 0.05ms/candidate) сокращает 10K до 200, heavy ranker (deep NN, 0.4ms/candidate) работает только с 200. Без funnel нужно 20x больше GPU.

Заблуждение: хронологическая лента проще и лучше

Twitter в 2023 открыл алгоритм и дал опцию хронологической ленты. Результат: 80%+ пользователей вернулись к algorithmic feed через неделю. Причина: при 500+ следованиях хронологическая лента заполнена нерелевантным контентом. Algorithmic ranking повышает engagement на 40-60% vs chronological. Но: пользователю нужна transparency (Instagram показывает "why you're seeing this").

Секция для интервью¶

Вопрос: "Как генерировать ленту для 2B пользователей за 200ms?"

Слабый ответ: "Кэшировать ленту для каждого пользователя."

Сильный ответ: "Multi-stage funnel. (1) Candidate generation (20ms): 10K кандидатов из social graph + groups + explore. (2) Light ranker (50ms): LR/GBDT на 50 cheap features, отсекает 80% (10K -> 200). (3) Heavy ranker (80ms): multi-tower deep NN с per-action heads (P(like), P(comment), P(share), P(hide)), combined score с product-defined weights. (4) Re-ranking (30ms): diversity rules, ad insertion, freshness boost. Инфраструктура: 500 GPU pods для heavy ranker, Redis feature store (200 shards, 10M ops/sec) с tiered freshness (real-time user actions, 15-min embeddings, daily aggregates). Caching: multi-level (pod-local 30%, Redis 50%, full recompute 20%). Cost: $2.75M/мес, но engagement lift 40-60% vs chronological."