Ранжирование ленты новостей -- компоненты¶

~4 минуты чтения

Предварительно: Определение задачи

Архитектура ленты новостей -- это конвейер из 5 компонентов, каждый с жёстким latency budget. Candidate Generation за 50 мс фильтрует 10 000+ постов до ~1000, Feature Extraction за 30 мс собирает 500--1000 фичей из 5 хранилищ, Ranking Model за 20 мс прогоняет multi-task DNN, Post-Ranking за 10 мс применяет diversity и business rules. Суммарно: весь pipeline за <200 мс при 500K QPS. На масштабе Meta это означает параллельную работу тысяч Triton Inference серверов и Redis-кластеров с <5 мс latency на чтение фичей.

High-Level Architecture¶

graph TD
    A["Content Sources<br/>Friends, Pages, Groups, Ads"] --> B["Candidate Generation<br/>~1000 candidates per user"]
    B --> C["Feature Extraction<br/>User, Content, Context features"]
    C --> D["Ranking Model<br/>Multi-task: P(like), P(comment), P(share), P(hide)"]
    D --> E["Post-Ranking<br/>Diversity, Freshness, Business Rules, Ads"]
    E --> F["Feed Delivery<br/>Top-N items, infinite scroll"]

    style D fill:#e8eaf6,stroke:#3f51b5
    style E fill:#fff3e0,stroke:#ff9800

Component Details¶

1. Candidate Generation¶

class CandidateGenerator:
    """
    Retrieves ~1000 candidates from millions of possible items
    """

    def __init__(self):
        self.sources = {
            "friends": FriendPostSource(),
            "groups": GroupPostSource(),
            "pages": PagePostSource(),
            "recommendations": RecommendationSource(),
            "ads": AdCandidateSource(),
        }

    def generate(self, user_id: str, limit: int = 1000) -> list[Candidate]:
        # Parallel retrieval from all sources
        candidates = []
        with ThreadPoolExecutor() as executor:
            futures = {
                name: executor.submit(source.fetch, user_id)
                for name, source in self.sources.items()
            }
            for name, future in futures.items():
                candidates.extend(future.result())

        # Deduplicate and apply hard filters
        candidates = self.deduplicate(candidates)
        candidates = self.apply_hard_filters(candidates, user_id)

        return candidates[:limit]

    def apply_hard_filters(self, candidates, user_id):
        """Remove blocked users, seen posts, policy violations"""
        blocked = self.get_blocked_users(user_id)
        seen = self.get_recently_seen(user_id)
        return [
            c for c in candidates
            if c.author_id not in blocked
            and c.post_id not in seen
            and not c.is_policy_violation
        ]

2. Feature Extraction¶

class FeatureExtractor:
    """
    Extract features for ranking model
    """

    def extract(self, user_id: str, candidates: list[Candidate]) -> list[FeatureVector]:
        user_features = self.get_user_features(user_id)
        context_features = self.get_context_features(user_id)

        feature_vectors = []
        for candidate in candidates:
            content_features = self.get_content_features(candidate)
            interaction_features = self.get_interaction_features(user_id, candidate)

            feature_vectors.append(FeatureVector(
                user=user_features,
                content=content_features,
                interaction=interaction_features,
                context=context_features,
            ))

        return feature_vectors

Feature Group	Examples	Storage
User	age, country, interests, engagement history	Redis (user profile)
Content	type, age, author, text embeddings, image features	Feature Store
Interaction	user-author affinity, past likes on similar content	Graph DB / Precomputed
Context	time of day, device, session depth, network speed	Request context
Social	friends who liked, mutual connections with author	Social Graph

3. Ranking Model¶

class MultiTaskRankingModel:
    """
    Multi-task model predicting engagement probabilities
    """

    def __init__(self):
        # Shared bottom layers
        self.shared_layers = nn.Sequential(
            nn.Linear(FEATURE_DIM, 512),
            nn.ReLU(),
            nn.Linear(512, 256),
            nn.ReLU(),
        )

        # Task-specific towers
        self.towers = {
            "like": TaskTower(256, 1),
            "comment": TaskTower(256, 1),
            "share": TaskTower(256, 1),
            "click": TaskTower(256, 1),
            "hide": TaskTower(256, 1),      # negative signal
            "report": TaskTower(256, 1),    # negative signal
        }

    def forward(self, features):
        shared = self.shared_layers(features)
        predictions = {
            task: torch.sigmoid(tower(shared))
            for task, tower in self.towers.items()
        }
        return predictions

    def compute_ranking_score(self, predictions: dict) -> float:
        """
        Weighted combination of task predictions
        """
        weights = {
            "like": 1.0,
            "comment": 5.0,    # comments more valuable
            "share": 10.0,     # shares most valuable
            "click": 0.5,
            "hide": -50.0,     # strong negative
            "report": -100.0,  # strongest negative
        }

        score = sum(
            weights[task] * predictions[task]
            for task in weights
        )
        return score

Model Evolution:

Generation	Architecture	Latency
V1	Logistic Regression	<1ms
V2	GBDT (XGBoost)	~5ms
V3	Deep Neural Network	~10ms
V4	Multi-task DNN (MMoE)	~15ms
V5	Transformer-based (current)	~20ms

4. Post-Ranking¶

class PostRanker:
    """
    Apply business rules and diversity after ML ranking
    """

    def rerank(self, ranked_items: list[RankedItem]) -> list[RankedItem]:
        # Diversity: no more than 2 consecutive posts from same author
        items = self.apply_diversity(ranked_items)

        # Freshness boost: recent posts get score bonus
        items = self.apply_freshness_boost(items)

        # Content type mixing: alternate photos, videos, text
        items = self.mix_content_types(items)

        # Ad injection: insert ads at positions 3, 8, 15, ...
        items = self.inject_ads(items)

        # Policy: ensure no adjacent sensitive content
        items = self.apply_policy_rules(items)

        return items

    def apply_diversity(self, items):
        result = []
        author_streak = defaultdict(int)

        for item in items:
            if author_streak[item.author_id] >= 2:
                continue  # skip, too many from same author
            result.append(item)
            author_streak[item.author_id] += 1

            # Reset streaks for other authors
            for a in author_streak:
                if a != item.author_id:
                    author_streak[a] = 0

        return result

5. Serving Infrastructure¶

graph LR
    A["App Server<br/>(API)"] --> B["Candidate<br/>Generator"]
    B --> C["Feature<br/>Store"]
    C --> D["Model<br/>Serving"]
    B --> E["Social Graph<br/>(Neo4j)"]
    C --> F["Redis<br/>(features)"]
    D --> G["Triton /<br/>TF Serve"]

    style A fill:#e8eaf6,stroke:#3f51b5
    style D fill:#e8eaf6,stroke:#3f51b5
    style E fill:#f3e5f5,stroke:#9c27b0
    style F fill:#fff3e0,stroke:#ef6c00
    style G fill:#e8f5e9,stroke:#4caf50

Component	Technology	SLA
Feature Store	Redis Cluster (hot), Hive (cold)	<5ms read
Model Serving	Triton Inference Server	<20ms p99
Candidate Store	Cassandra / ScyllaDB	<10ms
Social Graph	Neo4j / TAO (Meta)	<5ms
Message Queue	Kafka (engagement events)	Async

Latency Budget¶

Этап	Бюджет
Candidate generation	50 мс
Feature extraction	30 мс
Model inference	20 мс
Post-ranking	10 мс
Serialization	10 мс
Network overhead	80 мс
Итого	<200 мс

Типичные заблуждения¶

Заблуждение: можно ранжировать все посты одной моделью

При 10 000+ кандидатов прогонять полную DNN-модель для каждого -- это 10K x 20 мс = 200 секунд. Поэтому обязателен этап Candidate Generation, который легковесными фильтрами (социальный граф, recency) сужает пул до ~1000. Без этого этапа latency budget невыполним.

Заблуждение: multi-task модель всегда лучше отдельных моделей

Multi-task (MMoE, PLE) работает лучше только при наличии shared structure между задачами. Если P(like) и P(share) имеют очень разные feature distributions, shared bottom layers могут вредить -- явление negative transfer. На практике Facebook использует expert gating (MMoE) чтобы каждая задача сама выбирала какие shared layers использовать. Слепое объединение всех задач в одну модель может дать -3--5% по отдельным метрикам.

Заблуждение: Post-Ranking -- это просто фильтрация

Post-Ranking -- это не фильтр, а критический компонент для user experience. Без diversity enforcement пользователь может получить 10 постов от одного автора подряд. Без ad injection нет монетизации. Без content type mixing -- только текст или только видео. На масштабе Instagram post-ranking правила влияют на 15--20% финального ordering.

Интервью¶

Объясните, почему используется multi-stage pipeline, а не одна модель?

"Потому что так делают большие компании"

"Latency constraint. При 2B DAU и 500K QPS весь pipeline должен уложиться в <200 мс. Полная DNN-модель стоит ~20 мс на инференс. Для 10K+ кандидатов это 200 секунд -- невозможно. Поэтому Candidate Generation за 50 мс с лёгкими heuristics (социальный граф, recency, hard filters) сужает пул до ~1000. Затем Feature Extraction за 30 мс готовит фичи из Redis/Feature Store, и только потом 1000 кандидатов проходят через DNN за 20 мс. Каждый stage -- trade-off между precision и latency."

Как бы вы добавили рекламу в ленту?

"Вставляем рекламу на фиксированные позиции -- каждый 5-й пост"

"Комбинированный подход: (1) отдельная ad ranking система с собственным CTR prediction, (2) auction-based выбор рекламы с eCPM scoring, (3) ad slots на позициях 3, 8, 15 с адаптивным spacing в зависимости от session depth, (4) relevance matching к окружающему органическому контенту, (5) frequency caps -- не больше 3 показов одного рекламодателя в день. Ключевое: ad quality score должен учитывать user experience -- навязчивая реклама увеличивает hide rate и снижает retention."