Прохождение интервью: обнаружение мошенничества¶
~4 минуты чтения
Предварительно: Определение задачи, Компоненты, Метрики, Масштабирование
Fraud detection -- один из топ-3 кейсов на ML System Design интервью (наряду с рекомендациями и ранжированием поиска). Его спрашивают в Stripe, PayPal, Square, Visa, а также в Google, Meta, Amazon для payment-команд. Кейс сложен тем, что объединяет 5 ключевых ML-challenges одновременно: extreme class imbalance (1:1000), adversarial environment (фродстеры адаптируются), real-time requirements (< 100 мс), label delay (30-90 дней), и multi-signal orchestration (rules + ML + graph). Кандидат, который покроет все 5 за 45 минут, получит strong hire. Ниже -- полный walkthrough с таймингами, формулировками и типичными ловушками.
Interview Framework (45-60 min)¶
Timeline¶
0-5 min: Clarifying questions
5-15 min: High-level design
15-30 min: Deep dive (ML models, features)
30-45 min: Scaling, reliability, trade-offs
45-60 min: Extensions & Q&A
Step 1: Clarifying Questions (5 min)¶
Essential Questions¶
**Scope:**
- What type of fraud? (payment, identity, abuse)
- What's being protected? (transactions, accounts, promos)
**Scale:**
- Transaction volume? (100K/day vs 100M/day)
- Current fraud rate? (0.01% vs 1%)
**Requirements:**
- Latency requirement? (sync < 100ms or async ok?)
- Who makes final decision? (auto-decline or human review?)
**Data:**
- Historical fraud labels available?
- What signals do we have? (transactions, device, behavior)
- Label delay? (chargebacks in 30-90 days)
**Business:**
- Cost of false positive vs false negative?
- Regulatory requirements? (PCI DSS, GDPR)
Example Dialogue¶
You: "What type of fraud are we detecting?"
Interviewer: "Payment fraud for an online payment platform like Stripe"
You: "What's the transaction volume and current fraud rate?"
Interviewer: "50K TPS at peak, fraud rate around 0.1%"
You: "Is real-time decision required?"
Interviewer: "Yes, we need to approve/decline within 100ms"
You: "Do we have labeled historical data?"
Interviewer: "Yes, chargebacks and customer reports, but 30-90 day delay"
Step 2: High-Level Design (10 min)¶
API Design¶
# Request
POST /transactions/score
{
"transaction_id": "txn_123",
"user_id": "user_456",
"amount": 150.00,
"currency": "USD",
"merchant_id": "merch_789",
"card_token": "card_abc",
"device_fingerprint": "dev_xyz",
"ip_address": "1.2.3.4",
"timestamp": "2024-01-15T10:30:00Z"
}
# Response
{
"decision": "approve", # approve, review, decline
"fraud_score": 0.15,
"risk_signals": ["new_device", "high_amount"],
"request_id": "req_abc",
"latency_ms": 45
}
Architecture Diagram¶
graph TD
REQ["TRANSACTION REQUEST"] --> SERV["FRAUD SCORING SERVICE"]
subgraph SERV_SUB["Scoring Components"]
RULES["Rules Engine"]
ML["ML Models"]
GRAPH["Graph Analysis"]
DEC["Decision Engine"]
end
SERV --> RULES & ML & GRAPH & DEC
SERV --> FS["Feature Store<br/>(Redis)"]
SERV --> MS["Model Server<br/>(TF Serving)"]
SERV --> GDB["Graph DB<br/>(Neo4j)"]
style REQ fill:#e8eaf6,stroke:#3f51b5
style SERV fill:#fff3e0,stroke:#ef6c00
style FS fill:#e8f5e9,stroke:#4caf50
style MS fill:#e8f5e9,stroke:#4caf50
style GDB fill:#e8f5e9,stroke:#4caf50
Flow Explanation¶
"The flow works as follows:
1. Transaction comes in via API Gateway
2. Fraud Scoring Service orchestrates:
- Rules Engine: Fast blocklists, velocity checks
- Feature Store: Get user/device/transaction features
- ML Models: Ensemble of XGBoost + neural network
- Graph Analysis: Check device/IP connections
- Decision Engine: Combine scores, apply thresholds
3. Decision returned:
- Approve: < 0.3 score
- Review: 0.3-0.7 score (queue for analysts)
- Decline: > 0.7 score
4. Async: Log decision, update features, feedback loop"
Step 3: Deep Dive (15 min)¶
Feature Engineering¶
"Let me explain the key features..."
"Features fall into several categories:
1. **Transaction Features**
- Amount, currency, merchant category
- Time of day, day of week
- Channel (online, mobile, POS)
2. **Velocity Features** (critical for fraud)
- Transactions in last 1h/24h/7d
- Amount in last 1h/24h/7d
- Unique merchants in 24h
- Failed transactions in 24h
3. **User Behavior Features**
- Average transaction amount
- Typical spending hours
- Preferred merchants
- Account age
4. **Device/IP Features**
- Is device new?
- Is IP from datacenter/VPN?
- Other users on same device
- Geo-location vs user profile
5. **Graph Features**
- Connections to known fraudsters
- Users sharing device/IP
- Community fraud rate
For real-time, I'd compute velocity in Flink and store in Redis.
Batch features updated hourly in Spark."
ML Model Architecture¶
"For the ML model, I'd use an ensemble approach..."
"Why ensemble?
- Different models catch different patterns
- Reduces risk of single model failure
- Allows specialized models per fraud type
graph TD
subgraph ENS["ENSEMBLE MODEL"]
XGB["XGBoost<br/>Primary, w=0.5"]
NN["Neural Network<br/>w=0.3"]
ANOM["Anomaly Detector<br/>w=0.2"]
AVG["Weighted Average"]
SCORE["Final Score"]
XGB & NN & ANOM --> AVG --> SCORE
end
style XGB fill:#e8f5e9,stroke:#4caf50
style NN fill:#e8eaf6,stroke:#3f51b5
style ANOM fill:#f3e5f5,stroke:#9c27b0
style AVG fill:#fff3e0,stroke:#ef6c00
style SCORE fill:#fce4ec,stroke:#c62828
XGBoost: Fast, interpretable, handles tabular data well
Neural Network: Captures complex interactions
Anomaly Detector: Catches novel fraud patterns
Training considerations:
- Heavy class imbalance (0.1% fraud)
- Use SMOTE or weighted loss
- Optimize for AUC-PR, not accuracy
- Cost-sensitive learning: FN cost >> FP cost"
Handling Class Imbalance¶
"Class imbalance is the biggest challenge..."
"With 0.1% fraud rate:
- If model predicts 'not fraud' always = 99.9% accuracy
- But catches 0% of fraud!
Solutions:
1. **Sampling**
- SMOTE: Synthetic minority oversampling
- Undersampling majority class
2. **Cost-sensitive learning**
- Set class_weight: {fraud: 100, legit: 1}
- Custom loss function weighted by transaction amount
3. **Threshold tuning**
- Don't use 0.5 as threshold
- Tune for business metric (precision at recall=0.95)
4. **Evaluation metrics**
- AUC-PR (not AUC-ROC)
- Precision/Recall at fixed threshold
- $ caught / $ missed"
Step 4: Scaling & Trade-offs (15 min)¶
Scaling for 50K TPS¶
"For 50K TPS, here's the scaling strategy..."
"1. Stateless scoring service
- 100 pods, each handles 500 TPS
- Horizontal auto-scaling based on CPU/latency
2. Feature Store (Redis Cluster)
- 20 shards, 200GB total
- < 5ms feature retrieval
- Write-through caching from Flink
3. Model Serving
- TensorFlow Serving with batching
- 10 GPU instances for neural network
- CPU for XGBoost (faster for small batches)
4. Graph Database
- Neo4j cluster, 5 nodes
- Precompute common queries
- Fallback if slow (skip graph features)
5. Latency budget:
- Feature fetch: 10ms
- Rules: 5ms
- ML inference: 30ms
- Graph query: 15ms
- Total: ~60ms (buffer for p99)"
Key Trade-offs¶
"Several important trade-offs to discuss..."
"1. Precision vs Recall
- High precision: Few false alarms, but miss some fraud
- High recall: Catch more fraud, but more false declines
- Business decision: What's the cost of each?
- Solution: Separate thresholds for different risk levels
2. Latency vs Accuracy
- More features = better but slower
- Complex models = better but slower
- Solution: Two-stage (fast screen + detailed if suspicious)
3. Auto-decline vs Manual Review
- Auto-decline: Fast, but customer friction
- Manual review: Accurate, but expensive
- Solution: Three buckets (approve, review, decline)
4. Real-time vs Batch Features
- Real-time: Fresh, but limited computation
- Batch: Rich, but potentially stale
- Solution: Hybrid (real-time velocity + batch profiles)
5. Model Complexity vs Explainability
- Deep learning: Better accuracy
- Decision trees: Easier to explain
- Solution: Use SHAP for explanations"
Handling Adversarial Attacks¶
"Fraud is adversarial, attackers adapt..."
"Challenges:
- Fraudsters learn patterns and adapt
- They test with small transactions
- They use bots to probe rules
Defenses:
1. **Don't reveal reasons**
- Generic decline messages
- No specific error codes
2. **Feature monitoring**
- Track feature distributions
- Alert on sudden changes
3. **Model ensembles**
- Harder to game multiple models
- Different models have different weaknesses
4. **Continuous learning**
- Retrain models weekly
- Add new features for new patterns
5. **Honeypots**
- Fake patterns that only fraudsters trigger
- E.g., specific card BINs"
Step 5: Extensions & Q&A (10 min)¶
Common Follow-up Questions¶
Q: How do you handle cold start for new users?
"For new users with no history:
- Rely on device fingerprint (seen before?)
- IP reputation (datacenter? VPN?)
- Transaction pattern (testing small amounts?)
- Require additional verification (OTP, 3DS)
- Start with conservative limits, increase over time"
Q: How do you handle label delay?
"Labels come 30-90 days late via chargebacks:
1. **Train on mature data**
- Use transactions > 90 days old
- Accept that model is slightly outdated
2. **Use proxy labels**
- Customer complaints (faster)
- Failed delivery with claim
- Account suspension
3. **Semi-supervised learning**
- Use high-confidence predictions as labels
- Continuously update model"
Q: How do you explain decisions?
"Explainability is critical for:
- Regulatory compliance
- Customer disputes
- Analyst training
Approach:
1. SHAP values for feature contribution
2. Top 3-5 risk signals in response
3. Rule traces for rule-based decisions
4. Counterfactuals: 'If X was different, approved'"
Q: How do you monitor model drift?
"Monitor for:
1. Feature drift
- Distribution of inputs changing
- PSI (Population Stability Index)
2. Prediction drift
- Score distribution changing
- Fewer/more high-risk scores
3. Label drift
- Fraud rate changing
- New fraud types appearing
Alerting:
- Daily drift reports
- Automatic alerts if PSI > threshold
- Weekly model performance review"
Interview Checklist¶
Must Cover:¶
- Clarifying questions about fraud type and scale
- API and high-level architecture
- Feature engineering (especially velocity)
- ML model approach (ensemble, imbalance handling)
- Real-time vs batch features
- Decision thresholds (approve/review/decline)
- Latency budget
Good to Cover:¶
- Graph-based features
- Adversarial considerations
- Label delay handling
- A/B testing approach
- Monitoring and drift detection
Red Flags to Avoid:¶
- Ignoring class imbalance
- Using accuracy as metric
- Not discussing latency constraints
- Forgetting the adversarial nature
- No feedback loop mentioned
Заблуждение: можно пропустить clarifying questions и сразу рисовать архитектуру
Fraud detection для Stripe (payment fraud, 50K TPS, < 100ms) и для страховой компании (claims fraud, 1000/day, batch) -- совершенно разные системы. Без уточнения типа фрода, объёма и latency requirements вы рискуете проектировать не ту систему. Интервьюер оценивает structured thinking: 3-5 вопросов в первые 5 минут показывают, что вы понимаете domain.
Заблуждение: основное время нужно потратить на выбор ML-модели
Интервьюеры fraud detection ожидают, что вы потратите 60% времени на feature engineering, data pipeline и system design, и только 20% на модель. Причина: XGBoost vs LightGBM vs нейросеть -- это 1-2% разницы в AUC-PR. Но правильные velocity-фичи vs базовые -- это 10-15% разницы. Feature engineering и real-time serving architecture -- вот что отличает strong hire от hire.
Заблуждение: feedback loop -- это просто переобучение модели на новых данных
Полноценный feedback loop включает: (1) решения аналитиков из Case Management -> label store; (2) chargebacks через 30-90 дней -> обновление лейблов; (3) мониторинг false decline rate (клиенты, которые перезвонили после блокировки); (4) A/B тестирование новых моделей на 1-5% трафика с shadow scoring; (5) обновление Rules Engine на основе новых паттернов. Без этого модель деградирует за 2-4 недели в adversarial environment.
Sample Dialogue¶
Interviewer: "Design a fraud detection system for Stripe"
You: "Before diving in, let me understand the scope.
What type of fraud - payment fraud, account takeover, or both?"
Interviewer: "Focus on payment fraud"
You: "What's the transaction volume and current fraud rate?"
Interviewer: "50K TPS peak, 0.1% fraud rate"
You: "Is real-time scoring required, and what's the latency budget?"
Interviewer: "Yes, under 100ms"
You: "Let me outline my approach.
[Draw architecture]
The key insight for fraud is that it's adversarial and heavily imbalanced.
I'd use a multi-layered approach:
1. Fast rules for obvious fraud (blocklists, velocity)
2. ML ensemble for complex patterns
3. Graph analysis for connected fraud rings
4. Human review for edge cases
For the ML model, I'd use XGBoost as the primary model with
cost-sensitive learning to handle the 0.1% fraud rate.
Key features are velocity (transactions per hour) and
behavioral deviations from the user's normal pattern.
For 50K TPS, I'd horizontally scale the scoring service
and use Redis for real-time feature serving.
Latency budget: 10ms features, 30ms ML, 15ms graph = ~60ms total.
Shall I dive deeper into any component?"
Секция для интервью¶
Вопрос: "Как вы будете тестировать новую модель перед production?"
Слабый ответ: "Сравню метрики на тестовом датасете и задеплою."
Сильный ответ: "Четыре этапа: (1) Offline evaluation -- precision/recall/AUC-PR на holdout set с mature лейблами. Сравниваю с текущей моделью, проверяю по fraud types и сегментам (new users, high-value, mobile). (2) Shadow scoring -- новая модель скорит 100% трафика параллельно, но решения не применяются. Сравниваю score distributions, ищу случаи где модели сильно расходятся. 1-2 недели. (3) A/B test -- 5% трафика на новую модель, мониторю fraud loss rate, false decline rate, review volume. Guardrails: если fraud loss rate > 2x текущего -- автоматический откат. 2-4 недели. (4) Gradual rollout -- 5% -> 25% -> 50% -> 100% за 2 недели с мониторингом на каждом шаге."
Вопрос: "Что делать с adversarial adaptation?"
Слабый ответ: "Переобучать модель чаще."
Сильный ответ: "Пять стратегий: (1) Не раскрывать причины отказа -- generic 'transaction declined', без деталей. (2) Ensemble из разных типов моделей -- чтобы обмануть ансамбль, нужно обойти и XGBoost, и нейросеть, и anomaly detector одновременно. (3) Feature monitoring -- PSI > 0.2 на любой фиче = alert, возможно фродстеры нашли обход. (4) Honeypot features -- специальные паттерны, которые срабатывают только у фродстеров (например, определённые device fingerprint комбинации). (5) Weekly model retraining с включением новых паттернов + ежедневное обновление blocklists и velocity rules."