Предсказание кликов рекламы: компоненты системы¶
~4 минуты чтения
Предварительно: Определение задачи
Система ad serving состоит из 5 компонентов, работающих за <100ms: ad retrieval (targeting filter, 1M -> 1K), feature extraction (user + ad + context features из Feature Store за <5ms), CTR prediction (DCN/DeepFM model, <10ms inference), auction engine (second-price + quality score), и ad serving (rendering + impression logging). На каждом шаге есть критичные trade-offs: latency vs accuracy в модели, exploration vs exploitation для новых ads, revenue vs user experience в auction ranking.
High-Level Architecture¶
graph TD
A["Ad Request<br/>User visits page, ad slot available"] --> B["Ad Retrieval<br/>Filter by targeting, budget, freq cap"]
B --> C["Feature Extraction<br/>User, Ad creative, Context, Historical CTR"]
C --> D["CTR Prediction<br/>P(click | user, ad, context)"]
D --> E["Auction Engine<br/>Score = Bid x P(click) x Quality"]
E --> F["Ad Serving<br/>Render, log impression, track clicks"]
style D fill:#e8eaf6,stroke:#3f51b5
style E fill:#fff3e0,stroke:#ff9800
Component Details¶
1. Ad Retrieval¶
class AdRetriever:
"""
Filter eligible ads from millions of campaigns
"""
def retrieve(self, request: AdRequest) -> list[AdCandidate]:
# Stage 1: Targeting match (inverted index)
targeted = self.match_targeting(
user=request.user,
context=request.context,
) # millions -> ~10K
# Stage 2: Budget and pacing filter
active = self.filter_budget(targeted) # ~10K -> ~5K
# Stage 3: Frequency capping
eligible = self.filter_frequency(
active, request.user.id
) # ~5K -> ~2K
# Stage 4: Policy compliance
compliant = self.filter_policy(
eligible, request.context
) # ~2K -> ~1K
return compliant
def match_targeting(self, user, context):
"""
Inverted index: targeting criteria -> ad set
"""
matched = set()
# Demographics
matched |= self.index.get_by_age(user.age)
matched |= self.index.get_by_gender(user.gender)
matched |= self.index.get_by_location(context.geo)
# Interest-based
for interest in user.interests:
matched |= self.index.get_by_interest(interest)
# Lookalike audiences
for segment in user.lookalike_segments:
matched |= self.index.get_by_segment(segment)
return list(matched)
2. Feature Extraction¶
| Feature Group | Features | Source |
|---|---|---|
| User | age, gender, interests, device, OS, past CTR | User Profile Store |
| Ad | creative type (image/video/text), advertiser, category, historical CTR | Ad Metadata Store |
| Context | page type, position, time of day, day of week | Request |
| Cross | user-ad category affinity, user past interaction with advertiser | Feature Store |
| Real-time | session depth, recent clicks, recent searches | Session Store |
class CTRFeatureExtractor:
def extract(self, user, ad, context) -> FeatureVector:
features = {}
# Dense features
features["user_embedding"] = self.user_tower(user)
features["ad_embedding"] = self.ad_tower(ad)
# Sparse features (one-hot / multi-hot)
features["user_age_bucket"] = self.bucketize(user.age)
features["ad_category"] = ad.category_id
features["device_type"] = context.device_type
features["hour_of_day"] = context.hour
# Cross features
features["user_ad_affinity"] = self.get_affinity(
user.id, ad.category_id
)
features["historical_ctr"] = self.get_historical_ctr(
user.id, ad.advertiser_id
)
return FeatureVector(features)
3. CTR Prediction Model¶
class CTRModel(nn.Module):
"""
Deep & Cross Network (DCN-V2) for CTR prediction
"""
def __init__(self, sparse_dims, dense_dim):
super().__init__()
# Embedding layers for sparse features
self.embeddings = nn.ModuleDict({
name: nn.Embedding(dim, 16)
for name, dim in sparse_dims.items()
})
total_dim = len(sparse_dims) * 16 + dense_dim
# Cross Network (feature interactions)
self.cross_layers = nn.ModuleList([
CrossLayer(total_dim) for _ in range(3)
])
# Deep Network
self.deep = nn.Sequential(
nn.Linear(total_dim, 512),
nn.ReLU(),
nn.BatchNorm1d(512),
nn.Dropout(0.2),
nn.Linear(512, 256),
nn.ReLU(),
nn.BatchNorm1d(256),
nn.Dropout(0.1),
nn.Linear(256, 128),
nn.ReLU(),
)
# Final prediction
self.output = nn.Linear(total_dim + 128, 1)
def forward(self, sparse_features, dense_features):
# Embed sparse features
embedded = [
self.embeddings[name](sparse_features[name])
for name in self.embeddings
]
embedded = torch.cat(embedded, dim=-1)
# Combine with dense
x = torch.cat([embedded, dense_features], dim=-1)
# Cross network
cross_out = x
for layer in self.cross_layers:
cross_out = layer(cross_out, x)
# Deep network
deep_out = self.deep(x)
# Combine and predict
combined = torch.cat([cross_out, deep_out], dim=-1)
logit = self.output(combined)
return torch.sigmoid(logit)
Model Architecture Evolution:
| Generation | Model | Features |
|---|---|---|
| V1 | Logistic Regression | Sparse, manual feature crosses |
| V2 | GBDT (XGBoost) | + dense features |
| V3 | Wide & Deep | Sparse crosses + DNN |
| V4 | DCN / DeepFM | Automatic feature interactions |
| V5 | DIN (Deep Interest Network) | Attention over user history |
| V6 | DIEN + Transformer | Sequential user behavior |
4. Auction Engine¶
class AuctionEngine:
"""
Second-price auction with quality scoring
"""
def run_auction(
self,
candidates: list[AdCandidate],
predictions: dict[str, float],
) -> AuctionResult:
# Calculate ranking score for each ad
scored = []
for ad in candidates:
ctr = predictions[ad.id]
score = ad.bid * ctr * ad.quality_score
scored.append((ad, score, ctr))
# Sort by score descending
scored.sort(key=lambda x: x[1], reverse=True)
if not scored:
return AuctionResult(winner=None)
winner_ad, winner_score, winner_ctr = scored[0]
# Second-price: winner pays minimum to beat second place
if len(scored) > 1:
second_score = scored[1][1]
# cost = second_score / (winner_ctr * quality)
actual_cost = second_score / (
winner_ctr * winner_ad.quality_score
)
else:
actual_cost = winner_ad.reserve_price
return AuctionResult(
winner=winner_ad,
cost_per_click=actual_cost,
predicted_ctr=winner_ctr,
)
5. Calibration and Feedback¶
class CTRCalibrator:
"""
Ensure predicted CTR matches actual CTR
"""
def calibrate(self, predicted_ctr: float) -> float:
"""
Isotonic regression calibration
"""
return self.isotonic_model.predict([predicted_ctr])[0]
def update_calibration(self, impressions: list[Impression]):
"""
Periodic recalibration on recent data
"""
predicted = [imp.predicted_ctr for imp in impressions]
actual = [imp.was_clicked for imp in impressions]
self.isotonic_model.fit(predicted, actual)
Feedback loop:
Impression -> Click/No-click (label) -> Training data -> Model retrain -> Deploy
|
v
Delayed conversions (1-7 day attribution window)
Infrastructure¶
| Component | Technology | Scale |
|---|---|---|
| Feature Store | Redis (real-time), Hive (batch) | <5ms read |
| Model Serving | TensorRT + Triton | <10ms p99 |
| Ad Index | Inverted index (in-memory) | <2ms lookup |
| Event Stream | Kafka | 10M+ events/sec |
| Training | Spark + PyTorch distributed | Daily retrain |
| A/B Testing | Internal platform | 100+ concurrent experiments |
Latency Budget¶
Total ad request: <100ms
Ad retrieval: 20ms
Feature extraction: 15ms
CTR prediction: 10ms
Auction: 5ms
Ad rendering: 10ms
Network overhead: 40ms
Заблуждение: second-price auction прост и не требует калибровки
В second-price auction winner платит цену второго места: cost = second_score / (winner_ctr * quality). Если predicted CTR завышен в 2 раза, winner реально платит в 2 раза меньше, и платформа теряет revenue. Если занижен -- рекламодатель переплачивает и уходит. Калибровка напрямую влияет на unit economics.
Заблуждение: cross features можно не моделировать явно
DNN теоретически может выучить произвольные feature interactions, но на практике explicit cross features (user x ad_category, user x advertiser) дают +2-5% AUC vs чистый DNN. Именно поэтому DCN (Deep & Cross Network) стал стандартом -- cross network моделирует bounded-degree interactions, а deep network -- высокоуровневые паттерны.
Заблуждение: model evolution прямолинейна (LR -> GBDT -> DNN)
На практике в production часто работает ensemble: GBDT для dense features + DNN для embedding features. Meta использует DLRM (Deep Learning Recommendation Model), где sparse features обрабатываются embeddings + dot product, а dense features -- MLP. Чистый переход LR -> DNN не работает из-за serving latency.
На интервью¶
Типичные ошибки:
"Одна DNN модель скорит все ads" -- при 1M QPS и миллионах ads это невозможно. Нужна multi-stage architecture: targeting filter -> lightweight scorer -> deep model на top candidates.
"Frequency capping не важен" -- без capping один рекламодатель с высоким bid показывается 50 раз одному пользователю, CTR падает до 0%, user experience разрушается.
"Фичи считаем на лету" -- user features (preferences, history CTR) должны быть precomputed в Feature Store (Redis). На лету считаем только real-time context (device, time, page content).
Сильные ответы:
"DCN-V2 как основная модель: embedding layer для sparse features (user_id, ad_id, category), cross network для explicit 2nd/3rd order interactions, deep network для high-level patterns. Serving через TensorRT с INT8 quantization, <4ms inference."
"Калибровка через isotonic regression на validation set. Мониторинг: predicted/actual ratio по бакетам (должен быть ~1.0). Рекалибровка каждый час на свежих данных."
"Cold start для новых ads: используем advertiser-level и category-level priors (historical CTR). Thompson sampling для exploration: выделяем 5% трафика для новых ads, обновляем posterior после каждого impression."