Предсказание кликов рекламы: прохождение интервью¶
~4 минуты чтения
Предварительно: Определение задачи | Компоненты системы
Кейс "Design Ad Click Prediction" (CTR prediction) -- один из самых частых на ML System Design интервью в Google, Meta, Amazon, TikTok. Уникальная сложность: нужно одновременно обеспечить калибровку (для аукциона), latency <10ms (для real-time serving), и масштаб 1M+ QPS. Интервьюер ожидает конкретику по модели (DCN/DeepFM), features (cross features!), calibration, и cold start handling.
Interview Framework (45-60 min)¶
0-5 min: Clarifying questions
5-15 min: High-level design
15-30 min: Deep dive (model architecture, features)
30-45 min: Scaling & calibration
45-60 min: Extensions & Q&A
Step 1: Clarifying Questions (5 min)¶
**Scope:**
- What type of ads? (search, display, video)
- What platform? (social feed, search results, content site)
**Scale:**
- QPS for ad requests?
- Number of ads in inventory?
- Latency requirement?
**Business:**
- Auction type? (CPC, CPM)
- Multi-objective? (CTR + conversion)
- Privacy constraints?
**Data:**
- What user data available?
- Historical data volume?
- Feature refresh frequency?
Step 2: High-Level Design (10 min)¶
Architecture¶
graph TD
REQ["Ad Request"] --> SYSTEM["Ad Serving System"]
subgraph PIPELINE ["Ad Serving Pipeline"]
direction LR
CS["Candidate<br/>Selection"] --> CTR["CTR Model<br/>Prediction"]
CTR --> AUC["Auction<br/>Ranking"]
end
SYSTEM --> PIPELINE
SYSTEM -.-> FS["Feature Store<br/>(Redis)"]
SYSTEM -.-> MS["Model Server<br/>(TensorRT)"]
SYSTEM -.-> AI["Ad Index<br/>(In-memory)"]
style REQ fill:#e8eaf6,stroke:#3f51b5
style SYSTEM fill:#fff3e0,stroke:#ef6c00
style CS fill:#fff3e0,stroke:#ef6c00
style CTR fill:#e8eaf6,stroke:#3f51b5
style AUC fill:#f3e5f5,stroke:#9c27b0
style FS fill:#e8f5e9,stroke:#4caf50
style MS fill:#e8f5e9,stroke:#4caf50
style AI fill:#e8f5e9,stroke:#4caf50
Flow¶
"Three-stage pipeline:
1. **Candidate Selection** (1000 → 100)
- Filter by targeting rules
- Budget/pacing checks
- Lightweight scoring
2. **CTR Prediction** (100 ads)
- Deep model scoring
- P(click | user, ad, context)
3. **Auction Ranking**
- Score = bid × pCTR
- Winner determination
- Price computation (2nd price)
Latency budget: 10ms total
- Candidate: 2ms
- Features: 2ms
- Model: 4ms
- Auction: 2ms"
Step 3: Deep Dive (15 min)¶
Model Architecture¶
"I'd use a Deep & Cross Network (DCN) approach..."
"Why DCN?
- Explicit feature crosses (important for CTR)
- Deep network for complex patterns
- Efficient for high-dimensional sparse features
Architecture:
```mermaid
graph TD
UF["User Features"] --> UE["Embedding"]
AF["Ad Features"] --> AE["Embedding"]
CF["Context Features"] --> CE["Embedding"]
UE --> CONCAT["Concatenate"]
AE --> CONCAT
CE --> CONCAT
CONCAT --> CROSS["Cross Network"]
CONCAT --> DEEP["Deep Network"]
CROSS --> OUT["Output -> P(click)"]
DEEP --> OUT
style UF fill:#e8eaf6,stroke:#3f51b5
style AF fill:#e8eaf6,stroke:#3f51b5
style CF fill:#e8eaf6,stroke:#3f51b5
style UE fill:#fff3e0,stroke:#ef6c00
style AE fill:#fff3e0,stroke:#ef6c00
style CE fill:#fff3e0,stroke:#ef6c00
style CONCAT fill:#e8f5e9,stroke:#4caf50
style CROSS fill:#f3e5f5,stroke:#9c27b0
style DEEP fill:#f3e5f5,stroke:#9c27b0
style OUT fill:#fce4ec,stroke:#c62828
"1. User Features - User embedding (from history) - Demographics - Historical CTR by ad category - Session behavior
- Ad Features
- Ad embedding (creative, text)
- Advertiser quality score
- Historical CTR
-
Freshness
-
Cross Features (most important!)
- User × Ad category affinity
- User × Advertiser history
- User × Creative type CTR
-
Time × Ad category
-
Context
- Position (critical!)
- Device, OS
- Time of day
- Page content
Feature volume: - 1000+ raw features - Millions of feature crosses - Embeddings reduce dimensionality"
"Calibration is critical for auctions...""Why calibration matters: - Bid × pCTR determines ad rank - If pCTR is 2x actual, advertiser pays 2x - Destroys advertiser trust
Calibration approach: 1. Platt Scaling - Logistic regression on validation set - Scale raw scores to calibrated
- Isotonic Regression
- Non-parametric calibration
-
More flexible
-
Temperature Scaling
- Learn temperature T
- P_calibrated = softmax(logits / T)
Monitor: - Predicted/Actual ratio per bucket - Should be ~1.0 across all ranges"
"For massive scale...""1. Two-Stage Ranking - Stage 1: Simple model (logistic regression) - Stage 2: Deep model on top candidates - Reduces compute 10x
- Model Optimization
- Quantization (FP32 → INT8)
- Pruning (remove unused weights)
-
Distillation (smaller model)
-
Feature Caching
- User features: Cache for session
- Ad features: Precompute daily
-
Cross features: Compute on-demand
-
Sharding
- Shard by user_id
- Parallel scoring across shards
-
Each shard handles 50K QPS
-
Batching
- Batch multiple ads together
- GPU inference with batch=32" "Ads environment changes fast..."
"1. Streaming Updates - Process clicks/impressions in real-time - Update feature store immediately - Historical CTR refreshes hourly
- Model Retraining
- Full retrain: Daily
- Incremental update: Hourly
-
A/B test new models
-
Cold Start Handling
- New ads: Use advertiser/category prior
- Exploration: Allocate 5% traffic
- Thompson sampling for exploration" "Position strongly affects CTR:
- Position 1: 10% CTR
- Position 5: 2% CTR
Solutions: 1. Train separate model: P(click | position) 2. Predict: P(click | ad, user) × P(click | position) 3. Or: Include position as feature, debias
For unbiased training: - Use randomized data - Inverse propensity weighting"
"Optimize for click AND conversion:Objective: Score = w1 × P(click) + w2 × P(click) × P(conversion | click)
Approach: 1. Train separate models for each 2. Combine scores with business weights 3. Pareto optimization for trade-off"
"With GDPR and cookie deprecation:- On-device modeling
- Process data on user device
-
Send only aggregates
-
Federated Learning
- Train model across devices
-
No raw data leaves device
-
Contextual targeting
- Use page content, not user history
- First-party data only"
Interviewer: "Design CTR prediction for Google Ads"
## Interview Checklist ### Must Cover: - [ ] Two-stage architecture - [ ] Feature categories (user, ad, cross) - [ ] Model choice (deep learning for CTR) - [ ] Calibration importance - [ ] Latency optimization - [ ] Cold start handling ### Good to Cover: - [ ] Position bias - [ ] Continuous learning - [ ] Multi-objective - [ ] Privacy considerations - [ ] Exploration vs exploitation ### Red Flags: - [ ] Ignoring calibration - [ ] Not discussing cross features - [ ] Single-stage ranking at scale - [ ] Forgetting cold start - [ ] Not mentioning latency constraints ## Sample Script
You: "Great! Let me clarify - is this search ads or display?"
Interviewer: "Display ads on partner sites"
You: "And the scale? QPS and latency?"
Interviewer: "500K QPS, 10ms budget"
You: "Perfect. Here's my approach: [Draw architecture]
For 500K QPS in 10ms, I'd use two-stage ranking: - Stage 1: Lightweight model on 1000 candidates - Stage 2: Deep model on top 100
For the deep model, I'd use DCN (Deep & Cross Network) because cross features are critical for CTR.
Key features: - User embedding from click history - Ad creative embedding - User × Ad category affinity - Position (with debiasing)
Calibration is critical because bid × pCTR determines ad rank. I'd use isotonic regression for calibration.
For cold start, I'd use advertiser-level priors and allocate 5% for exploration.
Shall I dive into any component?" ```
Заблуждение: на интервью достаточно назвать модель и перечислить features
Интервьюер ожидает: (1) обоснование выбора модели под latency constraint (DCN inference <4ms vs BERT ~10ms), (2) конкретный latency breakdown по стадиям, (3) объяснение почему cross features важнее deep features для CTR. Просто сказать "используем DNN" -- это red flag.
Заблуждение: privacy constraints не меняют архитектуру
Без third-party cookies (GDPR, iOS ATT) теряется 30-40% targeting signal. Архитектура меняется кардинально: (1) contextual targeting вместо behavioral, (2) on-device modeling (Apple SKAdNetwork), (3) federated learning. На интервью нужно хотя бы упомянуть, что privacy constraints влияют на доступные features.
Заблуждение: exploration для новых ads -- это просто random показы
Random exploration расходует бюджет впустую. Thompson sampling (Bayesian подход) адаптивно выделяет больше показов ads с высоким uncertainty (новые) и меньше -- с низким (проверенные). Converges к optimal allocation в 3-5x быстрее чем epsilon-greedy.
На интервью¶
Типичные ошибки:
"Одна модель скорит все миллионы ads" -- при 500K QPS и миллионах ads нужна multi-stage: targeting filter + lightweight scorer + deep model на top-100
"Калибровка -- это просто sigmoid на выходе модели" -- sigmoid дает uncalibrated probabilities. Нужна post-hoc калибровка (isotonic regression, Platt scaling) с мониторингом predicted/actual ratio
Не упоминает second-price auction -- это основа ценообразования рекламы, без неё ответ показывает незнание домена
Сильные ответы:
"Latency breakdown: candidate selection 2ms (inverted index by targeting), feature extraction 2ms (precomputed в Redis), CTR prediction 4ms (DCN-V2, INT8 quantization, batch=32), auction 2ms. Total: 10ms."
"Multi-objective: Score = w1 * P(click) + w2 * P(click) * P(conversion|click). Отдельные модели для CTR и CVR, combined score для auction ranking. Веса w1, w2 -- бизнес-параметры."
"Cold start: новые ads получают advertiser-level prior CTR, Thompson sampling для exploration (5% трафика). После 1000 impressions модель переходит на ad-level features. Мониторинг: AUC по когортам (new ads vs established)."