Предсказание кликов рекламы: прохождение интервью¶

~4 минуты чтения

Предварительно: Определение задачи | Компоненты системы

Кейс "Design Ad Click Prediction" (CTR prediction) -- один из самых частых на ML System Design интервью в Google, Meta, Amazon, TikTok. Уникальная сложность: нужно одновременно обеспечить калибровку (для аукциона), latency <10ms (для real-time serving), и масштаб 1M+ QPS. Интервьюер ожидает конкретику по модели (DCN/DeepFM), features (cross features!), calibration, и cold start handling.

Interview Framework (45-60 min)¶

0-5 min:   Clarifying questions
5-15 min:  High-level design
15-30 min: Deep dive (model architecture, features)
30-45 min: Scaling & calibration
45-60 min: Extensions & Q&A

Step 1: Clarifying Questions (5 min)¶

**Scope:**
- What type of ads? (search, display, video)
- What platform? (social feed, search results, content site)

**Scale:**
- QPS for ad requests?
- Number of ads in inventory?
- Latency requirement?

**Business:**
- Auction type? (CPC, CPM)
- Multi-objective? (CTR + conversion)
- Privacy constraints?

**Data:**
- What user data available?
- Historical data volume?
- Feature refresh frequency?

Step 2: High-Level Design (10 min)¶

Architecture¶

graph TD
    REQ["Ad Request"] --> SYSTEM["Ad Serving System"]

    subgraph PIPELINE ["Ad Serving Pipeline"]
        direction LR
        CS["Candidate<br/>Selection"] --> CTR["CTR Model<br/>Prediction"]
        CTR --> AUC["Auction<br/>Ranking"]
    end

    SYSTEM --> PIPELINE
    SYSTEM -.-> FS["Feature Store<br/>(Redis)"]
    SYSTEM -.-> MS["Model Server<br/>(TensorRT)"]
    SYSTEM -.-> AI["Ad Index<br/>(In-memory)"]

    style REQ fill:#e8eaf6,stroke:#3f51b5
    style SYSTEM fill:#fff3e0,stroke:#ef6c00
    style CS fill:#fff3e0,stroke:#ef6c00
    style CTR fill:#e8eaf6,stroke:#3f51b5
    style AUC fill:#f3e5f5,stroke:#9c27b0
    style FS fill:#e8f5e9,stroke:#4caf50
    style MS fill:#e8f5e9,stroke:#4caf50
    style AI fill:#e8f5e9,stroke:#4caf50

Flow¶

"Three-stage pipeline:

1. **Candidate Selection** (1000 → 100)
   - Filter by targeting rules
   - Budget/pacing checks
   - Lightweight scoring

2. **CTR Prediction** (100 ads)
   - Deep model scoring
   - P(click | user, ad, context)

3. **Auction Ranking**
   - Score = bid × pCTR
   - Winner determination
   - Price computation (2nd price)

Latency budget: 10ms total
- Candidate: 2ms
- Features: 2ms
- Model: 4ms
- Auction: 2ms"

Step 3: Deep Dive (15 min)¶

Model Architecture¶

"I'd use a Deep & Cross Network (DCN) approach..."

"Why DCN?
- Explicit feature crosses (important for CTR)
- Deep network for complex patterns
- Efficient for high-dimensional sparse features

Architecture:

```mermaid
graph TD
    UF["User Features"] --> UE["Embedding"]
    AF["Ad Features"] --> AE["Embedding"]
    CF["Context Features"] --> CE["Embedding"]
    UE --> CONCAT["Concatenate"]
    AE --> CONCAT
    CE --> CONCAT
    CONCAT --> CROSS["Cross Network"]
    CONCAT --> DEEP["Deep Network"]
    CROSS --> OUT["Output -> P(click)"]
    DEEP --> OUT

    style UF fill:#e8eaf6,stroke:#3f51b5
    style AF fill:#e8eaf6,stroke:#3f51b5
    style CF fill:#e8eaf6,stroke:#3f51b5
    style UE fill:#fff3e0,stroke:#ef6c00
    style AE fill:#fff3e0,stroke:#ef6c00
    style CE fill:#fff3e0,stroke:#ef6c00
    style CONCAT fill:#e8f5e9,stroke:#4caf50
    style CROSS fill:#f3e5f5,stroke:#9c27b0
    style DEEP fill:#f3e5f5,stroke:#9c27b0
    style OUT fill:#fce4ec,stroke:#c62828

### Feature Engineering

"Key features for CTR prediction..."

"1. User Features - User embedding (from history) - Demographics - Historical CTR by ad category - Session behavior

Ad Features
Ad embedding (creative, text)
Advertiser quality score
Historical CTR
Freshness
Cross Features (most important!)
User × Ad category affinity
User × Advertiser history
User × Creative type CTR
Time × Ad category
Context
Position (critical!)
Device, OS
Time of day
Page content

Feature volume: - 1000+ raw features - Millions of feature crosses - Embeddings reduce dimensionality"

### Calibration

"Calibration is critical for auctions..."

"Why calibration matters: - Bid × pCTR determines ad rank - If pCTR is 2x actual, advertiser pays 2x - Destroys advertiser trust

Calibration approach: 1. Platt Scaling - Logistic regression on validation set - Scale raw scores to calibrated

Isotonic Regression
Non-parametric calibration
More flexible
Temperature Scaling
Learn temperature T
P_calibrated = softmax(logits / T)

Monitor: - Predicted/Actual ratio per bucket - Should be ~1.0 across all ranges"

## Step 4: Scaling & Optimization (15 min)

### Handling 1M QPS

"For massive scale..."

"1. Two-Stage Ranking - Stage 1: Simple model (logistic regression) - Stage 2: Deep model on top candidates - Reduces compute 10x

Model Optimization
Quantization (FP32 → INT8)
Pruning (remove unused weights)
Distillation (smaller model)
Feature Caching
User features: Cache for session
Ad features: Precompute daily
Cross features: Compute on-demand
Sharding
Shard by user_id
Parallel scoring across shards
Each shard handles 50K QPS
Batching
Batch multiple ads together
GPU inference with batch=32"
```
### Continuous Learning
```
"Ads environment changes fast..."

"1. Streaming Updates - Process clicks/impressions in real-time - Update feature store immediately - Historical CTR refreshes hourly

Model Retraining
Full retrain: Daily
Incremental update: Hourly
A/B test new models
Cold Start Handling
New ads: Use advertiser/category prior
Exploration: Allocate 5% traffic

Thompson sampling for exploration"

## Step 5: Extensions & Q&A (10 min)

### Common Questions

**Q: How do you handle position bias?**

"Position strongly affects CTR:

Position 1: 10% CTR
Position 5: 2% CTR

Solutions: 1. Train separate model: P(click | position) 2. Predict: P(click | ad, user) × P(click | position) 3. Or: Include position as feature, debias

For unbiased training: - Use randomized data - Inverse propensity weighting"

**Q: Multi-objective optimization?**

"Optimize for click AND conversion:

Objective: Score = w1 × P(click) + w2 × P(click) × P(conversion | click)

Approach: 1. Train separate models for each 2. Combine scores with business weights 3. Pareto optimization for trade-off"

**Q: Privacy-preserving?**

"With GDPR and cookie deprecation:

On-device modeling
Process data on user device
Send only aggregates
Federated Learning
Train model across devices
No raw data leaves device
Contextual targeting
Use page content, not user history

First-party data only"

## Interview Checklist

### Must Cover:
- [ ] Two-stage architecture
- [ ] Feature categories (user, ad, cross)
- [ ] Model choice (deep learning for CTR)
- [ ] Calibration importance
- [ ] Latency optimization
- [ ] Cold start handling

### Good to Cover:
- [ ] Position bias
- [ ] Continuous learning
- [ ] Multi-objective
- [ ] Privacy considerations
- [ ] Exploration vs exploitation

### Red Flags:
- [ ] Ignoring calibration
- [ ] Not discussing cross features
- [ ] Single-stage ranking at scale
- [ ] Forgetting cold start
- [ ] Not mentioning latency constraints

## Sample Script

Interviewer: "Design CTR prediction for Google Ads"

You: "Great! Let me clarify - is this search ads or display?"

Interviewer: "Display ads on partner sites"

You: "And the scale? QPS and latency?"

Interviewer: "500K QPS, 10ms budget"

You: "Perfect. Here's my approach: [Draw architecture]

For 500K QPS in 10ms, I'd use two-stage ranking: - Stage 1: Lightweight model on 1000 candidates - Stage 2: Deep model on top 100

For the deep model, I'd use DCN (Deep & Cross Network) because cross features are critical for CTR.

Key features: - User embedding from click history - Ad creative embedding - User × Ad category affinity - Position (with debiasing)

Calibration is critical because bid × pCTR determines ad rank. I'd use isotonic regression for calibration.

For cold start, I'd use advertiser-level priors and allocate 5% for exploration.

Shall I dive into any component?" ```

Заблуждение: на интервью достаточно назвать модель и перечислить features

Интервьюер ожидает: (1) обоснование выбора модели под latency constraint (DCN inference <4ms vs BERT ~10ms), (2) конкретный latency breakdown по стадиям, (3) объяснение почему cross features важнее deep features для CTR. Просто сказать "используем DNN" -- это red flag.

Заблуждение: privacy constraints не меняют архитектуру

Без third-party cookies (GDPR, iOS ATT) теряется 30-40% targeting signal. Архитектура меняется кардинально: (1) contextual targeting вместо behavioral, (2) on-device modeling (Apple SKAdNetwork), (3) federated learning. На интервью нужно хотя бы упомянуть, что privacy constraints влияют на доступные features.

Заблуждение: exploration для новых ads -- это просто random показы

Random exploration расходует бюджет впустую. Thompson sampling (Bayesian подход) адаптивно выделяет больше показов ads с высоким uncertainty (новые) и меньше -- с низким (проверенные). Converges к optimal allocation в 3-5x быстрее чем epsilon-greedy.

На интервью¶

Типичные ошибки:

"Одна модель скорит все миллионы ads" -- при 500K QPS и миллионах ads нужна multi-stage: targeting filter + lightweight scorer + deep model на top-100

"Калибровка -- это просто sigmoid на выходе модели" -- sigmoid дает uncalibrated probabilities. Нужна post-hoc калибровка (isotonic regression, Platt scaling) с мониторингом predicted/actual ratio

Не упоминает second-price auction -- это основа ценообразования рекламы, без неё ответ показывает незнание домена

Сильные ответы:

"Latency breakdown: candidate selection 2ms (inverted index by targeting), feature extraction 2ms (precomputed в Redis), CTR prediction 4ms (DCN-V2, INT8 quantization, batch=32), auction 2ms. Total: 10ms."

"Multi-objective: Score = w1 * P(click) + w2 * P(click) * P(conversion|click). Отдельные модели для CTR и CVR, combined score для auction ranking. Веса w1, w2 -- бизнес-параметры."

"Cold start: новые ads получают advertiser-level prior CTR, Thompson sampling для exploration (5% трафика). После 1000 impressions модель переходит на ad-level features. Мониторинг: AUC по когортам (new ads vs established)."