Прохождение интервью: рекомендательная система¶

~5 минут чтения

Предварительно: Определение задачи | Компоненты | Метрики | Масштабирование

Рекомендательная система -- один из трёх самых частых кейсов на ML System Design интервью (наряду с search ranking и ads CTR prediction). Интервью длится 45-60 минут и оценивает способность кандидата пройти от постановки задачи до production-ready архитектуры. Ключевой skill -- распределение времени: 5 минут на clarifying questions (определяют scope всего решения), 10 минут на high-level design (архитектура + API), 15 минут на deep dive (candidate generation + ranking + features), 15 минут на scaling + trade-offs. 80% кандидатов проваливаются на первых 5 минутах -- они прыгают к решению без уточнения масштаба, latency requirements и типа feedback.

Interview Framework (45-60 min)¶

Timeline¶

0-5 min:   Clarifying questions
5-15 min:  High-level design
15-30 min: Deep dive into components
30-45 min: Scaling & trade-offs
45-60 min: Extensions & Q&A

Step 1: Clarifying Questions (5 min)¶

Essential Questions to Ask¶

Scope & Scale:

What are we recommending? (products, videos, people, articles)
How many users? (1M, 100M, 1B)
How many items in catalog? (10K, 1M, 100M)
What's the expected QPS? (1K, 100K, 1M)

Requirements:

What's the latency requirement? (50ms, 100ms, 500ms)
Real-time personalization or batch ok?
Need to handle cold start? (new users/items)
Any regulatory constraints? (GDPR, content moderation)

Data:

What signals do we have? (views, clicks, purchases, ratings)
Implicit vs explicit feedback?
Do we have content/item metadata?
Historical data available?

Business:

What's the primary metric? (CTR, conversion, engagement)
Any diversity/freshness requirements?
Need explanations for recommendations?

Example Clarification Dialogue¶

You: "What are we recommending - products, content, or something else?"
Interviewer: "Products for an e-commerce platform like Amazon"

You: "How many active users and products?"
Interviewer: "100M monthly active users, 10M products"

You: "What's the latency requirement?"
Interviewer: "Under 100ms p99"

You: "What signals do we have?"
Interviewer: "Views, add-to-cart, purchases, and product ratings"

Step 2: High-Level Design (10 min)¶

Start with API Design¶

# Request
GET /recommendations
{
    "user_id": "user_123",
    "context": {
        "page": "homepage",
        "device": "mobile",
        "location": "checkout"
    },
    "limit": 20,
    "filters": {
        "category": "electronics",
        "price_max": 1000
    }
}

# Response
{
    "recommendations": [
        {
            "item_id": "item_456",
            "score": 0.95,
            "reason": "Based on your recent views"
        },
        ...
    ],
    "request_id": "req_abc",
    "model_version": "v2.3"
}

Draw the Architecture¶

graph TD
    CL["Client (App)"] --> GW["API Gateway"]
    GW --> RS["Rec Service"]

    RS --> CG["Candidate<br/>Generation"]
    RS --> RM["Ranking<br/>Model"]
    RS --> FS["Feature<br/>Store"]

    CG --> VDB["Item Index<br/>(Vector DB)"]
    RM --> MS["Model Server<br/>(TF Serving)"]

    style CL fill:#e8eaf6,stroke:#3f51b5
    style GW fill:#e8eaf6,stroke:#3f51b5
    style RS fill:#f3e5f5,stroke:#9c27b0
    style CG fill:#fff3e0,stroke:#ef6c00
    style RM fill:#fff3e0,stroke:#ef6c00
    style FS fill:#e8f5e9,stroke:#4caf50
    style VDB fill:#e8f5e9,stroke:#4caf50
    style MS fill:#e8f5e9,stroke:#4caf50

Explain the Flow¶

"The flow works like this:

1. User requests recommendations from the client
2. API Gateway authenticates and rate-limits
3. Recommendation Service orchestrates the pipeline:
   - Fetches user features from Feature Store
   - Candidate Generation retrieves ~1000 candidates from multiple sources
   - Ranking Model scores and orders candidates
   - Filtering removes business-rule violations
   - Blending adds diversity and returns top-N

4. Response includes items with scores and explanations"

Step 3: Deep Dive into Components (15 min)¶

Candidate Generation¶

"Let me explain candidate generation in detail..."

"The key insight is: we can't score all 10M items with a heavy model.
So we use fast retrieval to get ~1000 candidates, then rank those.

I'd use multiple sources:

1. **Two-Tower Model (ANN)**
   - Train user and item encoders separately
   - User embedding: f(user_features) -> 128-dim vector
   - Item embedding: g(item_features) -> 128-dim vector
   - Store item embeddings in FAISS/ScaNN index
   - At serving: compute user embedding, find nearest items
   - Latency: ~5ms for 1M items

2. **Collaborative Filtering**
   - Item-item similarity matrix (precomputed)
   - "Users who bought X also bought Y"
   - Fast lookup, good for warm users

3. **Popularity-based**
   - Trending items (last 24h)
   - Good for cold start

We merge candidates from all sources, dedupe, and pass to ranking."

Ranking Model¶

"For ranking, I'd use a multi-task learning approach..."

"The model predicts multiple objectives:
- P(click)
- P(purchase | click)
- P(long engagement)
- Expected revenue"

Ranking Architecture:

graph TD
    UF["User Features<br/>(embedding + dense)"] --> UT["User Tower"]
    IF["Item Features<br/>(embedding + dense)"] --> IT["Item Tower"]

    UT --> CC["Concat + Cross"]
    IT --> CC

    CC --> MLP["Shared MLP"]

    MLP --> PC["P(click)"]
    MLP --> PP["P(purchase)"]
    MLP --> PE["P(engage)"]

    style UF fill:#e8eaf6,stroke:#3f51b5
    style IF fill:#e8f5e9,stroke:#4caf50
    style UT fill:#e8eaf6,stroke:#3f51b5
    style IT fill:#e8f5e9,stroke:#4caf50
    style CC fill:#f3e5f5,stroke:#9c27b0
    style MLP fill:#f3e5f5,stroke:#9c27b0
    style PC fill:#fff3e0,stroke:#ef6c00
    style PP fill:#fff3e0,stroke:#ef6c00
    style PE fill:#fff3e0,stroke:#ef6c00

"Final score = weighted combination of objectives"

Feature Store¶

"For features, I'd use a dual-layer feature store..."

"Online Store (Redis/DynamoDB):
- User features: last actions, real-time session
- Item features: current price, stock, trending score
- Latency: <5ms

Offline Store (S3/BigQuery):
- Historical aggregations
- Precomputed embeddings
- Updated hourly/daily

Key features:
- User: purchase history, category preferences, price sensitivity
- Item: popularity, ratings, content embeddings
- Cross: user-item interaction history
- Context: time of day, device, location
"

Step 4: Scaling & Trade-offs (15 min)¶

Scaling Discussion¶

"For 100K RPS, here's how I'd scale..."

"1. Horizontal scaling of recommendation service
   - Stateless services, easy to scale
   - 50 pods, each handling 2K RPS

2. Feature Store scaling
   - Redis Cluster: 10 shards, 100GB total
   - Read replicas for geographic distribution

3. Candidate Generation
   - FAISS index: sharded across 5 machines
   - Each shard holds 2M item embeddings

4. Model Serving
   - TensorFlow Serving with batching
   - GPU instances for ranking model
   - CPU for simpler models

5. Caching
   - CDN for popular recommendations
   - Local cache for hot users/items
   - Redis for session-level cache
"

Key Trade-offs¶

"Let me discuss some key trade-offs..."

"1. Personalization vs Latency
   - More features = better recs but slower
   - Solution: Tiered personalization based on user value

2. Freshness vs Accuracy
   - Real-time features = fresher but less stable
   - Solution: Blend real-time with batch features

3. Exploration vs Exploitation
   - Too much exploitation = filter bubbles
   - Solution: Thompson sampling, 10% exploration slots

4. Model Complexity vs Serving Cost
   - Deep models are accurate but expensive
   - Solution: Distillation, quantization, two-stage ranking

5. Diversity vs Relevance
   - Too diverse = irrelevant; too similar = boring
   - Solution: MMR reranking with tunable lambda
"

Step 5: Extensions & Q&A (10 min)¶

Common Follow-up Questions¶

Q: How do you handle cold start?

"For new users:
- Start with popularity-based recs
- Use demographic features (age, location)
- Ask for preferences during onboarding
- Quick learning from first few interactions

For new items:
- Content-based features (title, image, category)
- Initial boost in exploration slots
- Leverage similar items' performance"

Q: How do you A/B test?

"I'd use a multi-layer experiment system:
- User-level randomization for long-term experiments
- Request-level for short-term/interleaving tests

Key metrics:
- Primary: Revenue per user, Conversion
- Secondary: CTR, Session duration
- Guardrails: Latency, Error rate, Coverage

Statistical framework:
- Minimum sample size calculation upfront
- Sequential testing for early stopping
- Multiple comparison correction"

Q: How do you ensure fairness?

"Fairness considerations:
- Item fairness: All sellers get exposure
- User fairness: Quality recs for all segments
- Algorithmic: No discrimination by demographics

Implementation:
- Monitor coverage by item attributes
- Audit model for disparate impact
- Diversity constraints in reranking"

Q: What about real-time updates?

"For real-time personalization:
- Kafka stream of user events
- Flink updates session features in Feature Store
- Model picks up fresh features each request

For item updates:
- New items indexed within 1 hour
- Price/stock updates in real-time
- Popularity scores updated every 5 min"

Interview Checklist¶

Must Cover:¶

Good to Cover:¶

Advanced Topics:¶

Multi-objective optimization
Reinforcement learning for exploration
Federated learning for privacy
Causal inference for evaluation

Red Flags to Avoid¶

Jumping to solution without clarifying
Always ask about scale and requirements first
Ignoring latency constraints
Remember: you can't score 10M items in 100ms
Single model approach
Two-stage (retrieval + ranking) is standard
Forgetting cold start
New users and items are common
Not discussing metrics
How do you measure success?
Ignoring business rules
Filtering, diversity, freshness matter

Sample Interview Script¶

Interviewer: "Design a recommendation system for Netflix"

You: "Before I start, let me clarify a few things..."
[Ask 5-6 clarifying questions]

You: "Based on these requirements, let me walk you through my design.
First, I'll sketch the high-level architecture..."
[Draw diagram, explain flow]

You: "Now let me dive deeper into the candidate generation layer..."
[Explain two-tower, ANN, multiple sources]

You: "For ranking, I'd use a deep learning model..."
[Explain architecture, features, training]

You: "To handle 100K RPS, here's how I'd scale..."
[Discuss horizontal scaling, caching, sharding]

You: "Some key trade-offs to consider..."
[Personalization vs latency, exploration vs exploitation]

Interviewer: "How would you handle a new user?"
You: "For cold start..."
[Explain strategy]

You: "Do you have any other questions about the design?"

Заблуждение: На интервью нужно сразу рисовать архитектуру

80% кандидатов начинают с рисования boxes и arrows. Но первые 5 минут должны быть посвящены clarifying questions: масштаб (1M vs 1B users), тип feedback (implicit vs explicit), latency requirements (50ms vs 500ms), cold start criticality. Ответы на эти вопросы определяют всю архитектуру. Без них вы рискуете проектировать систему для неправильных требований.

Заблуждение: Достаточно описать одну модель и готово

Интервьюер оценивает не знание конкретной модели, а системное мышление: как компоненты связаны, какие trade-offs, как масштабировать, как мониторить. Кандидат, описавший Two-Tower + FAISS + XGBoost + Deep Ranker + Feature Store + graceful degradation + A/B testing за 45 минут, получит Strong Hire. Кандидат, потративший 30 минут на детали одной нейросети -- не пройдёт.

Собеседование¶

Как структурировать ответ на System Design интервью?¶

"Сразу начинаю рисовать архитектуру и описывать модель."

"Структурированный подход: (1) Clarifying Questions 5 мин -- масштаб, latency, тип feedback, cold start, primary metric; (2) API Design -- request/response schema; (3) High-Level Architecture -- diagram с 5-6 компонентами; (4) Deep Dive 15 мин -- candidate generation (Two-Tower + ANN), ranking (coarse + fine), features (online + offline store); (5) Scaling 15 мин -- latency budget, HPA, sharding, caching, graceful degradation; (6) Trade-offs -- personalization vs latency, exploration vs exploitation, diversity vs relevance."

Какие ошибки чаще всего делают кандидаты?¶

"Забываю про cold start, но это мелочь."

"Топ-5 ошибок: (1) Прыжок к решению без clarifying questions -- scope определяет архитектуру; (2) Единственная модель без retrieval stage -- при 10M items это нереализуемо за 100ms; (3) Игнорирование latency budget -- нужно конкретно: candidate gen 20ms, ranking 40ms, total 100ms p99; (4) Отсутствие metrics discussion -- без offline/online/guardrail метрик непонятно как оценивать; (5) Нет graceful degradation -- система должна работать даже при partial failures, а не возвращать 503."

Как показать уровень Senior/Staff на интервью?¶

"Описать максимально сложную архитектуру с RL и transformers."

"Senior показывает глубину: конкретные latency budgets по компонентам, cost estimation ($30K/month breakdown), trade-off analysis с числами (diversity weight 0.3 = -2% CTR, +5% retention). Staff показывает breadth: как рекомендации влияют на другие системы (search, ads, notifications), multi-objective optimization с business constraints, experimentation platform design, организационные аспекты (кто владеет feature store, как координировать ML и backend teams). Конкретика > сложность."