Метрики системы ранжирования ленты новостей¶
~4 минуты чтения
Предварительно: Определение задачи, Компоненты
Facebook оптимизирует ленту по ~50 сигналам engagement одновременно, балансируя short-term metrics (clicks, likes) и long-term metrics (user retention, session quality). Наивная оптимизация clicks приводит к clickbait-инфляции и потере DAU через 2-3 месяца. Правильный подход: multi-objective optimization с guardrail метриками, где primary metric = meaningful social interactions (MSI), не просто engagement.
Иерархия метрик¶
| Уровень | Метрики | Цели |
|---|---|---|
| Business | DAU, session time, revenue | DAU retention > 95% MoM |
| User experience | Content diversity, satisfaction surveys | NPS > 60 |
| Engagement | Likes, comments, shares, watch time | MSI growth |
| Model | AUC, NDCG, calibration | NDCG@10 > 0.4 |
| System | Latency, throughput | p99 < 200ms, 500K QPS |
Business Metrics¶
Core Metrics¶
| Метрика | Формула | Цель | Warning threshold |
|---|---|---|---|
| DAU / MAU ratio | daily_active / monthly_active | > 0.65 | < 0.60 |
| Session duration | avg(session_end - session_start) | 33 min (FB) | < 28 min |
| Sessions per day | total_sessions / DAU | > 5 | < 4 |
| Revenue per user (ARPU) | total_revenue / MAU | Growing QoQ | -5% QoQ |
| Content creator retention | active_creators_m / active_creators_m-1 | > 90% | < 85% |
Meaningful Social Interactions (MSI)¶
def compute_msi(interactions: list) -> dict:
"""
Facebook's primary metric post-2018 (after misinformation crisis)
"""
weights = {
"comment": 5.0, # Highest value (conversation)
"share": 4.0, # Distribution to network
"reaction_love": 3.0, # Emotional response
"reaction_other": 2.0,
"like": 1.0, # Low effort
"click": 0.5, # Passive consumption
"impression": 0.1, # Scroll past
}
msi_score = sum(
weights.get(i.type, 0) * i.count
for i in interactions
)
return {
"msi_score": msi_score,
"msi_per_session": msi_score / len(set(i.session_id for i in interactions)),
"comment_ratio": sum(1 for i in interactions if i.type == "comment") / len(interactions),
}
Model Metrics¶
Ranking Quality¶
| Метрика | Формула | Цель | Что измеряет |
|---|---|---|---|
| NDCG@K | Normalized Discounted Cumulative Gain | > 0.4 | Ranking relevance |
| AUC (per action) | Per-action prediction AUC | > 0.75 | Prediction quality |
| Calibration | pred_P(action) / actual_P(action) | 0.95-1.05 | Score accuracy |
| Multi-objective Pareto | Pareto front coverage | Growing | Balance of objectives |
Per-Action Prediction¶
def compute_multi_action_metrics(y_true_dict, y_pred_dict):
"""Отдельные метрики для каждого типа действия"""
results = {}
actions = ["like", "comment", "share", "click", "watch_30s", "hide", "report"]
for action in actions:
y_true = y_true_dict[action]
y_pred = y_pred_dict[action]
results[action] = {
"auc": roc_auc_score(y_true, y_pred),
"logloss": log_loss(y_true, y_pred),
"base_rate": y_true.mean(),
"calibration": y_pred.mean() / y_true.mean(),
}
return results
# Target AUCs:
# like: 0.75 (common action, easier)
# comment: 0.70 (rarer, harder)
# share: 0.68 (rarest, hardest)
# click: 0.72
# hide: 0.65 (negative signal, sparse data)
Diversity & Freshness¶
| Метрика | Описание | Цель |
|---|---|---|
| Content type diversity | Entropy of content types in feed | > 1.5 bits |
| Source diversity | % unique content creators in feed | > 30% |
| Freshness | Median age of shown content | < 4 hours |
| Echo chamber score | Viewpoint diversity in political content | > 0.6 |
| Repeat content rate | % items seen before | < 5% |
Online Metrics (A/B Test)¶
Primary Metrics¶
| Метрика | MDE | Test duration |
|---|---|---|
| MSI per session | +0.5% | 2 weeks |
| Session duration | +1% | 1 week |
| DAU (28-day) | +0.1% | 4 weeks |
Guardrail Metrics¶
| Метрика | Constraint | Rationale |
|---|---|---|
| Content diversity | No decrease > 5% | Prevent echo chambers |
| Misinformation exposure | No increase > 1% | Trust & safety |
| Clickbait ratio | No increase > 2% | Content quality |
| Creator impression equity | Gini < 0.7 | Creator ecosystem health |
| Ad revenue | No decrease > 0.5% | Business viability |
| Negative actions (hide, report) | No increase > 3% | User satisfaction |
System Metrics¶
| Метрика | Цель |
|---|---|
| Feed generation p50 | < 100ms |
| Feed generation p99 | < 200ms |
| Throughput | 500K QPS |
| Feature freshness | < 5 min |
| Model freshness | < 6 hours |
| Cache hit rate | > 80% |
Заблуждение: оптимизировать click-through rate
CTR optimization -> clickbait. Facebook в 2018 сменила primary metric с engagement time на MSI (meaningful social interactions). Результат: session time упал на 5%, но DAU retention вырос, и долгосрочный engagement стабилизировался. На интервью упомянуть MSI -- сильный сигнал: вы думаете о long-term effects, не только о proxy metrics.
Заблуждение: одна модель предсказывает 'engagement'
Engagement -- не одна вещь. Like (low effort), comment (high effort), share (distribution), watch_time (passive consumption) -- разные действия с разным value. Production-системы предсказывают P(action) для каждого действия отдельно, затем combined score: \(\sum w_i \cdot P(\text{action}_i)\). Веса \(w_i\) -- product decision, не ML decision. Менять веса можно без retrain.
Секция для интервью¶
Вопрос: "Какие метрики для ленты новостей?"
Слабый ответ: "CTR и время на платформе."
Сильный ответ: "Multi-level. Business: DAU/MAU ratio (> 0.65), session duration, ARPU, creator retention. User experience: MSI (meaningful social interactions -- comments > shares > likes > clicks), content diversity (entropy > 1.5 bits), freshness (median < 4h). Model: per-action AUC (like 0.75, comment 0.70, share 0.68), NDCG@10, calibration per action. Guardrails: misinformation exposure, clickbait ratio, creator impression equity (Gini < 0.7), negative action rate. Ключевой insight: наивная оптимизация CTR -> clickbait -> потеря DAU через 2-3 месяца. Facebook перешла на MSI в 2018 именно поэтому. Правильный combined score: \(\sum w_i \cdot P(\text{action}_i)\) где веса -- product decision."