Ранжирование поиска: прохождение интервью¶

~3 минуты чтения

Предварительно: Определение задачи | Компоненты системы | Требования к данным

Кейс "Design Search Ranking" -- один из топ-5 самых частых на ML System Design интервью (Google, Amazon, LinkedIn, Spotify). Интервьюер оценивает умение декомпозировать задачу на стадии (retrieval -> ranking -> re-ranking), подбирать модель под latency-бюджет (200ms), и обосновывать выбор features. Ниже -- пошаговый план на 45-60 минут с примерами диалогов и типичными follow-up вопросами.

Interview Framework (45-60 min)¶

Timeline¶

0-5 min:   Clarifying questions
5-15 min:  High-level design
15-30 min: Deep dive (ranking model, features)
30-45 min: Scaling & trade-offs
45-60 min: Extensions & Q&A

Step 1: Clarifying Questions (5 min)¶

Essential Questions¶

**Scope:**
- What type of search? (e-commerce, web, people)
- What are we searching? (products, documents, users)

**Scale:**
- Index size? (1M vs 1B documents)
- QPS? (1K vs 100K)
- Latency requirement? (100ms vs 500ms)

**Features:**
- Personalization needed?
- Real-time indexing or batch?
- Autocomplete/suggestions?
- Filters and facets?

**Data:**
- What engagement data? (clicks, conversions)
- Historical search logs available?

Example Dialogue¶

You: "What type of content are we searching?"
Interviewer: "Products on an e-commerce platform like Amazon"

You: "How many products and what's the QPS?"
Interviewer: "10 million products, 50K QPS at peak"

You: "What's the latency budget?"
Interviewer: "Under 200ms end-to-end"

You: "Is personalization required?"
Interviewer: "Yes, results should be personalized to user preferences"

Step 2: High-Level Design (10 min)¶

API Design¶

# Request
GET /search?q=красные+кроссовки&category=shoes&price_max=5000&page=1

# Response
{
    "query": "красные кроссовки",
    "corrected_query": null,
    "total_results": 1523,
    "results": [
        {
            "product_id": "p123",
            "title": "Nike Air Max Red",
            "price": 4500,
            "rating": 4.5,
            "position": 1,
            "score": 0.95
        },
        ...
    ],
    "facets": {
        "brand": [{"name": "Nike", "count": 50}, ...],
        "size": [{"name": "42", "count": 30}, ...]
    },
    "suggestions": ["красные кроссовки nike", "красные кроссовки adidas"],
    "latency_ms": 85
}

Architecture Diagram¶

graph TD
    REQ["SEARCH REQUEST"] --> SERV["SEARCH SERVICE"]

    subgraph SERV_SUB["Pipeline Stages"]
        QP["Query Processing"] --> RET["Retrieval<br/>(BM25+ANN)"]
        RET --> RANK["Ranking<br/>(LTR)"]
        RANK --> RERANK["Re-ranking<br/>(Diversity)"]
    end

    SERV --> QP
    SERV --> IDX["Search Index<br/>(Elasticsearch)"]
    SERV --> FS["Feature Store<br/>(Redis)"]
    SERV --> MODEL["Ranking Model<br/>(LightGBM)"]

    style REQ fill:#e8eaf6,stroke:#3f51b5
    style SERV fill:#fff3e0,stroke:#ef6c00
    style IDX fill:#e8f5e9,stroke:#4caf50
    style FS fill:#e8f5e9,stroke:#4caf50
    style MODEL fill:#e8f5e9,stroke:#4caf50

Flow Explanation¶

"The search pipeline has 4 main stages:

1. **Query Processing** (10ms)
   - Normalize and tokenize query
   - Spell check and correction
   - Entity extraction (brand, size, color)
   - Query expansion with synonyms

2. **Retrieval** (50ms)
   - Hybrid: BM25 (lexical) + semantic embeddings
   - From 10M products → 1000 candidates
   - Apply hard filters (price, availability)

3. **Ranking** (80ms)
   - Learning-to-rank model (LambdaMART)
   - Rich features: query-doc, user, context
   - 1000 → 100 ranked results

4. **Re-ranking** (20ms)
   - Diversity (don't show all Nike)
   - Freshness boost
   - Promotional slots

Total: ~160ms well within 200ms budget"

Step 3: Deep Dive (15 min)¶

Retrieval Strategy¶

"Let me explain the retrieval stage in detail..."

"The challenge is: we have 10M products, can't score all with ML.
So we need fast retrieval to get ~1000 candidates.

I'd use **hybrid retrieval**:

1. **Lexical (BM25)** - 60% of candidates
   - Fast, handles exact matches well
   - Elasticsearch with proper analyzers
   - Query: title^3, description, brand^2

2. **Semantic (Vector Search)** - 40% of candidates
   - BERT embeddings for query and products
   - Catches semantic similarity
   - 'sneakers' matches 'кроссовки'

Merging strategy:
- Reciprocal Rank Fusion (RRF)
- Or weighted linear combination

Why hybrid?
- Lexical alone misses semantic matches
- Semantic alone misses exact matches
- Hybrid gets best of both"

Ranking Model¶

"For the ranking model, I'd use LambdaMART..."

"Why LambdaMART?
- Optimizes for NDCG directly
- Fast inference (gradient boosted trees)
- Interpretable feature importance

Features I'd include:

**Query-Document Matching (most important)**
- BM25 score
- Title match ratio
- Exact match in title
- Brand match
- Category match

**Document Quality**
- Rating (4.5 stars)
- Review count
- Conversion rate
- Return rate

**Personalization**
- User's category affinity
- User's brand preference
- Previous purchases in category
- Click history with similar items

**Context**
- Position bias correction
- Device type
- Time of day

Training approach:
- Listwise loss (LambdaLoss)
- Position debiasing with inverse propensity
- Graded relevance: purchase=4, cart=3, click=2, impression=0"

Two-Stage Ranking¶

"Actually, I'd use two-stage ranking for latency..."

"Stage 1: Coarse ranking (fast)
- Lightweight model (logistic regression)
- 20 features only
- 1000 → 200 candidates
- 20ms

Stage 2: Fine ranking (accurate)
- Full LambdaMART model
- 100+ features
- 200 → 50 final results
- 50ms

Why two-stage?
- Can't run heavy model on 1000 items in time
- Coarse filter removes obviously bad results
- Fine ranking on quality candidates"

Step 4: Scaling & Trade-offs (15 min)¶

Scaling for 50K QPS¶

"For 50K QPS, here's the architecture..."

"1. Search Service
   - Stateless, horizontal scaling
   - 100 pods, each handling 500 QPS
   - Auto-scaling on CPU/latency

2. Elasticsearch Cluster
   - 10 data nodes, 5 shards per index
   - Read replicas for hot queries
   - SSD storage for low latency

3. Feature Store
   - Redis cluster for user features
   - Precomputed document features in ES
   - < 5ms feature retrieval

4. Model Serving
   - LightGBM is fast (CPU)
   - Batch scoring (32 items at once)
   - Local caching of model

5. Caching
   - Popular queries cached (30%)
   - CDN for static assets
   - 5-minute TTL"

Latency Breakdown¶

"Latency budget breakdown..."

Total: 200ms

Query Processing:    10ms
├── Tokenization:    2ms
├── Spell check:     3ms
├── NER:             3ms
└── Embedding:       2ms

Retrieval:           50ms
├── BM25 query:      30ms
├── Vector search:   15ms
└── Merge:           5ms

Ranking:             80ms
├── Feature fetch:   20ms
├── Coarse rank:     20ms
└── Fine rank:       40ms

Re-ranking:          20ms
├── Diversity:       10ms
├── Business rules:  5ms
└── Response build:  5ms

Buffer:              40ms

Key Trade-offs¶

"Several important trade-offs..."

"1. **Recall vs Latency**
   - More candidates = better recall, slower
   - Solution: Fast retrieval, heavy ranking
   - Balance: 1000 candidates is sweet spot

2. **Personalization vs Freshness**
   - User features can be stale
   - Solution: Blend personalized + popular

3. **Relevance vs Diversity**
   - Pure relevance → all same brand
   - Solution: MMR re-ranking
   - Balance: diversity_weight = 0.2

4. **Semantic vs Lexical**
   - Semantic catches meaning but can be wrong
   - Lexical is precise but misses synonyms
   - Solution: Hybrid with tunable weight

5. **Online vs Offline Features**
   - Real-time features fresher but expensive
   - Solution: Precompute what you can"

Step 5: Extensions & Q&A (10 min)¶

Common Follow-up Questions¶

Q: How do you handle typos?

"Multi-layered approach:
1. Edit distance to known vocabulary
2. Query log mining for common corrections
3. Phonetic matching (Soundex, Metaphone)
4. Character n-gram matching

For 'ккроссовки' → 'кроссовки':
- Edit distance: 1 deletion
- High frequency in query logs
- Confidence: high → auto-correct"

Q: How do you handle zero results?

"Zero results are terrible UX. Solutions:

1. Query relaxation
   - Remove least important terms
   - Expand synonyms more aggressively

2. Fallback strategies
   - Show related categories
   - Show popular items in category

3. Prevention
   - Better spell correction
   - Suggest similar queries

Monitor zero-result rate as key metric"

Q: How do you handle position bias?

"Position bias: users click top results more regardless of relevance.

Problem: Training data is biased.

Solutions:
1. **Inverse Propensity Weighting**
   - Weight = 1 / P(click | position)
   - Down-weight clicks on top positions

2. **Randomization**
   - 1% traffic: shuffle top results
   - Use unbiased data for training

3. **Propensity Model**
   - Train model to predict P(click | position)
   - Use as feature or for debiasing"

Q: How do you A/B test ranking changes?

"Interleaving is best for ranking:

1. Team Draft Interleaving
   - Alternate between control/treatment
   - Count which got more clicks

2. Delta NDCG
   - Compare ranking quality offline first
   - Online only if offline wins

3. Guardrail metrics
   - Revenue per search
   - Zero result rate
   - Query abandonment

Ship if: statistically significant improvement
         AND no guardrail regressions"

Interview Checklist¶

Must Cover:¶

Clarifying questions
Two-stage pipeline (retrieval + ranking)
Hybrid retrieval (lexical + semantic)
Ranking model choice and features
Latency budget breakdown
Scaling for QPS

Good to Cover:¶

Red Flags:¶

Not discussing two-stage (retrieval vs ranking)
Ignoring latency constraints
Only lexical OR only semantic (should be hybrid)
Not mentioning position bias
Forgetting about cold start

Sample Script¶

Interviewer: "Design search for Amazon"

You: "Great! Let me clarify a few things first.
What's the scale - how many products and QPS?"

Interviewer: "10 million products, 50K QPS"

You: "And the latency requirement?"

Interviewer: "Under 200ms"

You: "Perfect. Let me outline my approach.
[Draw architecture]

The key insight is we need two stages:
fast retrieval to get candidates,
then ML ranking for precision.

For retrieval, I'd use hybrid search:
- BM25 for lexical matching
- BERT embeddings for semantic
- Merge with RRF

For ranking, LambdaMART with features:
- Query-doc matching (BM25, title match)
- Quality signals (rating, reviews)
- Personalization (user preferences)

For 50K QPS, scale horizontally with
100 pods and Redis for user features.

Shall I dive deeper into any component?"

Заблуждение: достаточно нарисовать архитектуру и перечислить компоненты

Интервьюер ожидает конкретные цифры: latency budget по стадиям (10ms + 50ms + 80ms + 20ms = 160ms), количество кандидатов на каждом этапе (10M -> 1000 -> 200 -> 50), и обоснование выбора модели. Абстрактный ответ "ML model для ранжирования" без деталей -- это red flag.

Заблуждение: можно пропустить clarifying questions

Разница между web search и e-commerce search определяет всю архитектуру: другие метрики (NDCG vs conversion rate), другие features (PageRank vs inventory), другие latency бюджеты (200ms vs 500ms). 5 минут на уточнения экономят 20 минут на переделку.

Заблуждение: interleaving и A/B тест -- одно и то же

A/B тест делит пользователей на две группы и сравнивает метрики. Interleaving показывает одному пользователю результаты обеих систем вперемешку и считает, какая получила больше кликов. Interleaving определяет победителя в 10x меньше трафика и является стандартом для search ranking.

На интервью¶

Типичные ошибки:

"Я бы использовал neural network для ранжирования" -- без указания конкретной архитектуры (LambdaMART vs cross-encoder vs DIN) и обоснования, почему именно она подходит под latency budget

Рисует архитектуру без цифр -- нет QPS, нет latency breakdown, нет количества кандидатов на каждом этапе

"A/B тест: 50% пользователей видят новую модель" -- для search ranking interleaving (Team Draft) дает надёжные результаты в 100x меньше трафика

Сильные ответы:

"Начну с уточнений: тип поиска, масштаб (index size, QPS), latency budget, доступные сигналы. Затем нарисую 4-stage pipeline с конкретными цифрами по latency и фильтрации кандидатов."

"Для zero results: query relaxation (убрать наименее важные токены), fallback на популярные в категории, мониторинг zero-result rate как guardrail метрики (<1%)."

"Trade-off recall vs latency: при 50K QPS и 200ms budget, sweet spot -- 1000 кандидатов из retrieval. Больше 2000 -- не укладываемся в latency, меньше 500 -- теряем recall."