Перейти к содержанию

Предсказание кликов рекламы: прохождение интервью

~4 минуты чтения

Предварительно: Определение задачи | Компоненты системы

Кейс "Design Ad Click Prediction" (CTR prediction) -- один из самых частых на ML System Design интервью в Google, Meta, Amazon, TikTok. Уникальная сложность: нужно одновременно обеспечить калибровку (для аукциона), latency <10ms (для real-time serving), и масштаб 1M+ QPS. Интервьюер ожидает конкретику по модели (DCN/DeepFM), features (cross features!), calibration, и cold start handling.

Interview Framework (45-60 min)

0-5 min:   Clarifying questions
5-15 min:  High-level design
15-30 min: Deep dive (model architecture, features)
30-45 min: Scaling & calibration
45-60 min: Extensions & Q&A

Step 1: Clarifying Questions (5 min)

**Scope:**
- What type of ads? (search, display, video)
- What platform? (social feed, search results, content site)

**Scale:**
- QPS for ad requests?
- Number of ads in inventory?
- Latency requirement?

**Business:**
- Auction type? (CPC, CPM)
- Multi-objective? (CTR + conversion)
- Privacy constraints?

**Data:**
- What user data available?
- Historical data volume?
- Feature refresh frequency?

Step 2: High-Level Design (10 min)

Architecture

graph TD
    REQ["Ad Request"] --> SYSTEM["Ad Serving System"]

    subgraph PIPELINE ["Ad Serving Pipeline"]
        direction LR
        CS["Candidate<br/>Selection"] --> CTR["CTR Model<br/>Prediction"]
        CTR --> AUC["Auction<br/>Ranking"]
    end

    SYSTEM --> PIPELINE
    SYSTEM -.-> FS["Feature Store<br/>(Redis)"]
    SYSTEM -.-> MS["Model Server<br/>(TensorRT)"]
    SYSTEM -.-> AI["Ad Index<br/>(In-memory)"]

    style REQ fill:#e8eaf6,stroke:#3f51b5
    style SYSTEM fill:#fff3e0,stroke:#ef6c00
    style CS fill:#fff3e0,stroke:#ef6c00
    style CTR fill:#e8eaf6,stroke:#3f51b5
    style AUC fill:#f3e5f5,stroke:#9c27b0
    style FS fill:#e8f5e9,stroke:#4caf50
    style MS fill:#e8f5e9,stroke:#4caf50
    style AI fill:#e8f5e9,stroke:#4caf50

Flow

"Three-stage pipeline:

1. **Candidate Selection** (1000 → 100)
   - Filter by targeting rules
   - Budget/pacing checks
   - Lightweight scoring

2. **CTR Prediction** (100 ads)
   - Deep model scoring
   - P(click | user, ad, context)

3. **Auction Ranking**
   - Score = bid × pCTR
   - Winner determination
   - Price computation (2nd price)

Latency budget: 10ms total
- Candidate: 2ms
- Features: 2ms
- Model: 4ms
- Auction: 2ms"

Step 3: Deep Dive (15 min)

Model Architecture

"I'd use a Deep & Cross Network (DCN) approach..."

"Why DCN?
- Explicit feature crosses (important for CTR)
- Deep network for complex patterns
- Efficient for high-dimensional sparse features

Architecture:

```mermaid
graph TD
    UF["User Features"] --> UE["Embedding"]
    AF["Ad Features"] --> AE["Embedding"]
    CF["Context Features"] --> CE["Embedding"]
    UE --> CONCAT["Concatenate"]
    AE --> CONCAT
    CE --> CONCAT
    CONCAT --> CROSS["Cross Network"]
    CONCAT --> DEEP["Deep Network"]
    CROSS --> OUT["Output -> P(click)"]
    DEEP --> OUT

    style UF fill:#e8eaf6,stroke:#3f51b5
    style AF fill:#e8eaf6,stroke:#3f51b5
    style CF fill:#e8eaf6,stroke:#3f51b5
    style UE fill:#fff3e0,stroke:#ef6c00
    style AE fill:#fff3e0,stroke:#ef6c00
    style CE fill:#fff3e0,stroke:#ef6c00
    style CONCAT fill:#e8f5e9,stroke:#4caf50
    style CROSS fill:#f3e5f5,stroke:#9c27b0
    style DEEP fill:#f3e5f5,stroke:#9c27b0
    style OUT fill:#fce4ec,stroke:#c62828
### Feature Engineering
"Key features for CTR prediction..."

"1. User Features - User embedding (from history) - Demographics - Historical CTR by ad category - Session behavior

  1. Ad Features
  2. Ad embedding (creative, text)
  3. Advertiser quality score
  4. Historical CTR
  5. Freshness

  6. Cross Features (most important!)

  7. User × Ad category affinity
  8. User × Advertiser history
  9. User × Creative type CTR
  10. Time × Ad category

  11. Context

  12. Position (critical!)
  13. Device, OS
  14. Time of day
  15. Page content

Feature volume: - 1000+ raw features - Millions of feature crosses - Embeddings reduce dimensionality"

### Calibration
"Calibration is critical for auctions..."

"Why calibration matters: - Bid × pCTR determines ad rank - If pCTR is 2x actual, advertiser pays 2x - Destroys advertiser trust

Calibration approach: 1. Platt Scaling - Logistic regression on validation set - Scale raw scores to calibrated

  1. Isotonic Regression
  2. Non-parametric calibration
  3. More flexible

  4. Temperature Scaling

  5. Learn temperature T
  6. P_calibrated = softmax(logits / T)

Monitor: - Predicted/Actual ratio per bucket - Should be ~1.0 across all ranges"

## Step 4: Scaling & Optimization (15 min)

### Handling 1M QPS
"For massive scale..."

"1. Two-Stage Ranking - Stage 1: Simple model (logistic regression) - Stage 2: Deep model on top candidates - Reduces compute 10x

  1. Model Optimization
  2. Quantization (FP32 → INT8)
  3. Pruning (remove unused weights)
  4. Distillation (smaller model)

  5. Feature Caching

  6. User features: Cache for session
  7. Ad features: Precompute daily
  8. Cross features: Compute on-demand

  9. Sharding

  10. Shard by user_id
  11. Parallel scoring across shards
  12. Each shard handles 50K QPS

  13. Batching

  14. Batch multiple ads together
  15. GPU inference with batch=32"
    ### Continuous Learning
    
    "Ads environment changes fast..."

"1. Streaming Updates - Process clicks/impressions in real-time - Update feature store immediately - Historical CTR refreshes hourly

  1. Model Retraining
  2. Full retrain: Daily
  3. Incremental update: Hourly
  4. A/B test new models

  5. Cold Start Handling

  6. New ads: Use advertiser/category prior
  7. Exploration: Allocate 5% traffic
  8. Thompson sampling for exploration"
    ## Step 5: Extensions & Q&A (10 min)
    
    ### Common Questions
    
    **Q: How do you handle position bias?**
    
    "Position strongly affects CTR:
  9. Position 1: 10% CTR
  10. Position 5: 2% CTR

Solutions: 1. Train separate model: P(click | position) 2. Predict: P(click | ad, user) × P(click | position) 3. Or: Include position as feature, debias

For unbiased training: - Use randomized data - Inverse propensity weighting"

**Q: Multi-objective optimization?**
"Optimize for click AND conversion:

Objective: Score = w1 × P(click) + w2 × P(click) × P(conversion | click)

Approach: 1. Train separate models for each 2. Combine scores with business weights 3. Pareto optimization for trade-off"

**Q: Privacy-preserving?**
"With GDPR and cookie deprecation:

  1. On-device modeling
  2. Process data on user device
  3. Send only aggregates

  4. Federated Learning

  5. Train model across devices
  6. No raw data leaves device

  7. Contextual targeting

  8. Use page content, not user history
  9. First-party data only"
    ## Interview Checklist
    
    ### Must Cover:
    - [ ] Two-stage architecture
    - [ ] Feature categories (user, ad, cross)
    - [ ] Model choice (deep learning for CTR)
    - [ ] Calibration importance
    - [ ] Latency optimization
    - [ ] Cold start handling
    
    ### Good to Cover:
    - [ ] Position bias
    - [ ] Continuous learning
    - [ ] Multi-objective
    - [ ] Privacy considerations
    - [ ] Exploration vs exploitation
    
    ### Red Flags:
    - [ ] Ignoring calibration
    - [ ] Not discussing cross features
    - [ ] Single-stage ranking at scale
    - [ ] Forgetting cold start
    - [ ] Not mentioning latency constraints
    
    ## Sample Script
    
    Interviewer: "Design CTR prediction for Google Ads"

You: "Great! Let me clarify - is this search ads or display?"

Interviewer: "Display ads on partner sites"

You: "And the scale? QPS and latency?"

Interviewer: "500K QPS, 10ms budget"

You: "Perfect. Here's my approach: [Draw architecture]

For 500K QPS in 10ms, I'd use two-stage ranking: - Stage 1: Lightweight model on 1000 candidates - Stage 2: Deep model on top 100

For the deep model, I'd use DCN (Deep & Cross Network) because cross features are critical for CTR.

Key features: - User embedding from click history - Ad creative embedding - User × Ad category affinity - Position (with debiasing)

Calibration is critical because bid × pCTR determines ad rank. I'd use isotonic regression for calibration.

For cold start, I'd use advertiser-level priors and allocate 5% for exploration.

Shall I dive into any component?" ```

Заблуждение: на интервью достаточно назвать модель и перечислить features

Интервьюер ожидает: (1) обоснование выбора модели под latency constraint (DCN inference <4ms vs BERT ~10ms), (2) конкретный latency breakdown по стадиям, (3) объяснение почему cross features важнее deep features для CTR. Просто сказать "используем DNN" -- это red flag.

Заблуждение: privacy constraints не меняют архитектуру

Без third-party cookies (GDPR, iOS ATT) теряется 30-40% targeting signal. Архитектура меняется кардинально: (1) contextual targeting вместо behavioral, (2) on-device modeling (Apple SKAdNetwork), (3) federated learning. На интервью нужно хотя бы упомянуть, что privacy constraints влияют на доступные features.

Заблуждение: exploration для новых ads -- это просто random показы

Random exploration расходует бюджет впустую. Thompson sampling (Bayesian подход) адаптивно выделяет больше показов ads с высоким uncertainty (новые) и меньше -- с низким (проверенные). Converges к optimal allocation в 3-5x быстрее чем epsilon-greedy.

На интервью

Типичные ошибки:

❌ "Одна модель скорит все миллионы ads" -- при 500K QPS и миллионах ads нужна multi-stage: targeting filter + lightweight scorer + deep model на top-100

❌ "Калибровка -- это просто sigmoid на выходе модели" -- sigmoid дает uncalibrated probabilities. Нужна post-hoc калибровка (isotonic regression, Platt scaling) с мониторингом predicted/actual ratio

❌ Не упоминает second-price auction -- это основа ценообразования рекламы, без неё ответ показывает незнание домена

Сильные ответы:

✅ "Latency breakdown: candidate selection 2ms (inverted index by targeting), feature extraction 2ms (precomputed в Redis), CTR prediction 4ms (DCN-V2, INT8 quantization, batch=32), auction 2ms. Total: 10ms."

✅ "Multi-objective: Score = w1 * P(click) + w2 * P(click) * P(conversion|click). Отдельные модели для CTR и CVR, combined score для auction ranking. Веса w1, w2 -- бизнес-параметры."

✅ "Cold start: новые ads получают advertiser-level prior CTR, Thompson sampling для exploration (5% трафика). После 1000 impressions модель переходит на ad-level features. Мониторинг: AUC по когортам (new ads vs established)."