Модерация контента: прохождение интервью¶
~3 минуты чтения
Предварительно: Определение задачи | Компоненты
Content moderation -- самый complex MLSD-кейс на интервью: multi-modal (текст + изображения + видео + аудио), policy-aware (zero-tolerance vs context-dependent), human-in-the-loop (500+ модераторов), и regulatory-compliant (EU DSA, NetzDG). Ключ к сильному ответу: не пытаться решить всё ML-моделью, а показать tiered architecture (fast filter -> deep analysis -> human review) с confidence-based routing.
Interview Framework (45-60 min)¶
Timeline¶
0-5 min: Clarifying questions
5-15 min: High-level design
15-30 min: Deep dive (multi-modal ML)
30-45 min: Scaling, reliability, edge cases
45-60 min: Extensions & Q&A
Step 1: Clarifying Questions (5 min)¶
Essential Questions¶
**Scope:**
- What type of content? (text, images, video, all?)
- What violations? (NSFW, hate speech, violence, spam?)
- Pre or post moderation?
**Scale:**
- Content volume? (1M vs 1B/day)
- Latency requirements?
- Current moderation team size?
**Requirements:**
- Auto-action or human-in-the-loop?
- Appeals process needed?
- Regional compliance? (EU DSA, German NetzDG)
**Context:**
- Platform type? (social media, marketplace, gaming)
- User demographics?
- Existing moderation infrastructure?
Example Dialogue¶
You: "What types of content need moderation?"
Interviewer: "Text posts, images, and short videos on a social platform"
You: "What categories of violations?"
Interviewer: "NSFW, hate speech, violence, harassment, and spam"
You: "What's the volume and latency requirement?"
Interviewer: "100 million pieces/day, need to moderate before showing"
You: "Do you have human moderators?"
Interviewer: "Yes, 500 moderators across regions"
Step 2: High-Level Design (10 min)¶
API Design¶
# Request
POST /moderate
{
"content_id": "c123",
"content_type": "post",
"text": "Some user text...",
"image_urls": ["https://..."],
"video_url": null,
"user_id": "u456",
"context": {
"community": "gaming",
"is_reply": true,
"parent_content_id": "c100"
}
}
# Response
{
"content_id": "c123",
"decision": "approve", # approve, review, reject
"confidence": 0.95,
"scores": {
"nsfw": 0.02,
"hate_speech": 0.05,
"violence": 0.01,
"spam": 0.08
},
"flagged_reasons": [],
"processing_time_ms": 150
}
Architecture Diagram¶
graph TD
CS["Content Submission"] --> MP["Moderation Pipeline"]
MP --> TM["Text Model"]
MP --> IM["Image Model"]
MP --> VM["Video Model"]
MP --> DE["Decision Engine"]
TM & IM & VM --> DE
DE --> APP["APPROVE<br/>(Publish)"]
DE --> REV["REVIEW<br/>(Queue)"]
DE --> REJ["REJECT<br/>(Block)"]
REV --> HM["Human Moderators"]
style CS fill:#e8eaf6,stroke:#3f51b5
style MP fill:#f3e5f5,stroke:#9c27b0
style TM fill:#e8eaf6,stroke:#3f51b5
style IM fill:#e8eaf6,stroke:#3f51b5
style VM fill:#e8eaf6,stroke:#3f51b5
style DE fill:#f3e5f5,stroke:#9c27b0
style APP fill:#e8f5e9,stroke:#4caf50
style REV fill:#fff3e0,stroke:#ef6c00
style REJ fill:#fce4ec,stroke:#c62828
style HM fill:#fff3e0,stroke:#ef6c00
Flow Explanation¶
"The moderation flow has 3 key stages:
1. **Content Processing**
- Extract text, images, video frames
- Run preprocessing (OCR for images, transcription for video)
2. **Multi-modal ML Classification**
- Text model: BERT fine-tuned for toxicity
- Image model: ResNet for NSFW, violence
- Video model: Sample frames + audio analysis
3. **Decision Engine**
- Aggregate scores from all models
- Apply policy rules (zero-tolerance, context-based)
- Route to: Approve, Review, or Reject
4. **Human Review** (for edge cases)
- Priority queue by severity
- SLA-based assignment
- Feedback loop to ML"
Step 3: Deep Dive (15 min)¶
Text Moderation¶
"Let me dive into text moderation..."
"The challenge with text is adversarial users:
- 'h4te' instead of 'hate'
- 'f*ck' with symbols
- Unicode tricks
My approach:
1. **Preprocessing**
- Normalize unicode (homoglyphs)
- Decode leet speak
- Handle obfuscation patterns
2. **Multi-label Classification**
- Fine-tuned BERT for each category
- Multi-task learning for efficiency
- Output: scores for each violation type
3. **Context Awareness**
- Use conversation history
- Consider community norms
- Adjust thresholds per context
Architecture: (см. схему ниже)"
graph TD
INP["Input: h4te speech"] --> PP["Preprocessor<br/>(normalize, clean)"]
PP --> BERT["BERT Encoder<br/>(multilingual)"]
BERT --> CLS["Classification Heads<br/>(per category)"]
CLS --> SC["Scores: hate=0.85,<br/>toxic=0.9, ..."]
style INP fill:#e8eaf6,stroke:#3f51b5
style PP fill:#fff3e0,stroke:#ef6c00
style BERT fill:#f3e5f5,stroke:#9c27b0
style CLS fill:#e8eaf6,stroke:#3f51b5
style SC fill:#e8f5e9,stroke:#4caf50
Image Moderation¶
"For images, we need multi-task detection..."
"Key challenges:
1. NSFW content detection
2. Violence and gore
3. Hate symbols
4. Text in images (OCR)
Pipeline:
1. **NSFW Classifier**
- ResNet50 fine-tuned on NSFW dataset
- 5 categories: safe, suggestive, porn, hentai, neutral
2. **Violence Detector**
- Object detection for weapons
- Scene classification for violence
3. **Hate Symbol Detection**
- YOLO for specific symbols
- Database of 500+ hate symbols
4. **OCR + Text Moderation**
- Extract text from memes
- Run through text moderation
All models run in parallel for latency"
Video Moderation¶
"Video is the most complex modality..."
"Challenges:
- Can't analyze every frame (too slow)
- Audio may contain violations
- Temporal context matters
My approach:
1. **Smart Frame Sampling**
- Detect scene changes
- Sample 1 frame per scene
- ~30 frames for 5min video
2. **Frame Analysis**
- Same models as image moderation
- Run in parallel
3. **Audio Analysis**
- Speech-to-text for transcript
- Run text moderation on transcript
- Audio event detection (gunshots, etc.)
4. **Aggregation**
- Max score across frames
- Flag specific timestamps
- Consider temporal patterns"
Step 4: Scaling & Edge Cases (15 min)¶
Scaling for 100M/day¶
"For 100M pieces per day..."
"That's ~1200/second average, 3x at peak.
Architecture:
1. **Async Processing**
- Kafka queue for content
- Worker pool processes in parallel
- Decouple submission from moderation
2. **GPU Clusters**
- 50 GPU nodes for ML inference
- TensorRT optimization
- Batching for efficiency
3. **Caching**
- Hash-based dedup (same image = same result)
- Cache decisions for 24 hours
- ~30% cache hit rate
4. **Tiered Processing**
- Fast check: known bad (blocklist)
- ML check: most content
- Deep check: edge cases only
5. **Horizontal Scaling**
- Stateless services
- Auto-scale based on queue depth"
Handling Edge Cases¶
"Let me discuss tricky cases..."
"1. **Context-Dependent Content**
- 'I'll kill you' in gaming vs real life
- Solution: Community-specific thresholds
- Use conversation context
2. **News vs Violation**
- War images in news vs glorification
- Solution: Account reputation, context labels
- Route to specialized reviewers
3. **Art vs NSFW**
- Classical paintings
- Solution: Fine-grained classification
- Human review for edge cases
4. **Satire and Sarcasm**
- Hard for ML to detect
- Solution: Lower confidence, route to review
- Consider user history
5. **New Meme Formats**
- ML hasn't seen before
- Solution: Active learning, fast retraining
- Human feedback loop"
Trade-offs¶
"Key trade-offs to discuss..."
"1. **Precision vs Recall**
- High precision = few false positives (angry creators)
- High recall = catch more violations (safer platform)
- My approach: Tune per category
- Zero tolerance (child safety): High recall
- Borderline (suggestive): High precision
2. **Speed vs Accuracy**
- Faster = simpler models
- More accurate = complex ensemble
- Solution: Two-stage
- Fast filter for obvious cases
- Deep analysis for uncertain
3. **Automation vs Human**
- Auto-action: Fast, scalable, errors at scale
- Human: Accurate, expensive, slow
- Solution: Confidence-based routing
- High confidence (>95%): Auto-action
- Low confidence: Human review
4. **Global vs Local**
- Same rules globally: Consistent but insensitive
- Local rules: Respectful but complex
- Solution: Base model + regional adapters"
Step 5: Extensions & Q&A (10 min)¶
Common Follow-up Questions¶
Q: How do you handle adversarial attacks?
"Users will try to bypass:
1. **Detection**
- Monitor for patterns (l33t speak, unicode)
- Track bypass attempts per user
2. **Prevention**
- Robust preprocessing (normalize everything)
- Train on adversarial examples
- Multiple model ensemble
3. **Response**
- Penalize repeat offenders
- Update models quickly
- Manual review for suspicious patterns"
Q: How do you handle appeals?
"Appeals workflow:
1. User submits appeal with reason
2. Queue for different reviewer (not original)
3. More senior reviewer for high-profile cases
4. ML learns from overturned decisions
Key metrics:
- Appeal rate: Should be < 5%
- Overturn rate: Should be < 10%
- High overturn = model/policy problem"
Q: How do you train the models?
"Training pipeline:
1. **Initial Training**
- Public datasets (Jigsaw Toxicity, etc.)
- Platform-specific labeled data
2. **Continuous Learning**
- Human review decisions as labels
- Active learning for edge cases
- Weekly model updates
3. **Evaluation**
- Hold-out test set (gold standard)
- A/B test new models
- Monitor precision/recall daily"
Q: How do you ensure moderator well-being?
"Moderators see disturbing content:
1. **Content Blurring**
- Default blur for graphic content
- Moderator chooses to reveal
2. **Rotation**
- Rotate categories regularly
- Limit exposure to severe content
3. **Support**
- Counseling services
- Regular check-ins
- Break rooms and wellness programs
4. **Automation**
- Auto-action for worst content
- Reduce human exposure"
Interview Checklist¶
Must Cover:¶
- Clarifying questions
- Multi-modal architecture
- Text preprocessing (adversarial handling)
- Decision thresholds (auto vs review)
- Human-in-the-loop workflow
- Scaling strategy
Good to Cover:¶
- Context-aware moderation
- Regional/cultural considerations
- Continuous learning pipeline
- Appeals process
- Moderator well-being
Red Flags:¶
- Single model for everything
- Ignoring adversarial users
- No human review path
- Not discussing context
- Ignoring edge cases
Sample Script¶
Interviewer: "Design content moderation for Instagram"
You: "Great question! Let me clarify the scope.
What content types - posts, stories, reels, comments?"
Interviewer: "All of them"
You: "And what categories of violations?"
Interviewer: "Focus on NSFW, hate speech, and bullying"
You: "Perfect. Let me outline my approach.
[Draw architecture]
The key insight is multi-modal content needs
multi-modal moderation. I'd have separate models
for text, images, and video, then aggregate.
For text, I'd use fine-tuned BERT with
preprocessing to handle adversarial input
like leet speak and unicode tricks.
For images, ResNet for NSFW with OCR to
catch text in memes.
For video, smart frame sampling at scene
changes plus audio transcription.
Decision engine aggregates all scores and
routes to: auto-approve, human review, or reject.
For 100M/day, I'd use async processing with
Kafka and GPU clusters for inference.
Shall I dive deeper into any component?"
Типичные заблуждения¶
Заблуждение: на интервью достаточно описать ML pipeline без Human Review
Human-in-the-loop -- обязательная часть ответа. Без обсуждения: (1) priority queue по severity, (2) SLA для разных категорий, (3) moderator specialization и rotation, (4) feedback loop в ML -- ответ считается неполным. Интервьюер ожидает понимание, что ML -- это первый фильтр, а не единственное решение.
Заблуждение: scaling = просто добавить больше GPU
При 100M items/day (1200/sec average, 3600/sec peak) ключевые оптимизации: (1) Hash-based dedup -- одинаковое изображение = один результат, экономит ~30% compute, (2) Tiered processing -- blocklist check до ML, (3) Async Kafka queue -- decouple submission от moderation, (4) Model batching для GPU efficiency. Без этих оптимизаций потребуется 5x больше GPU.
Вопросы с оценкой ответов¶
Как обрабатывать edge case: news account публикует graphic violence (война)?
"Автоматически блокировать -- violence is violence" -- нарушает свободу прессы
"Context-aware routing: (1) Account reputation check -- verified news accounts имеют отдельный threshold, (2) Violence model выдаёт high score, но Decision Engine проверяет context (is_news_account=true), (3) Route to specialized reviewers (не auto-reject), (4) If approved -- add content warning label вместо удаления. Policy: news exception для violence, но NEVER для child safety. Это показывает понимание policy-aware moderation."
Как масштабировать модерацию видео при 500 часов загрузки в минуту?
"Анализировать каждый кадр каждого видео" -- физически невозможно при таком объёме
"Tiered approach: (1) Быстрый check первых 3 кадров + thumbnail на NSFW/violence (<1s), если clean -- defer deep analysis. (2) Smart frame sampling: scene-change detection, ~30 кадров для 5-min video. (3) Audio pipeline параллельно: speech-to-text + text moderation + audio event detection (gunshots, screaming). (4) Prioritization: videos от new accounts и trending content проверяются первыми. (5) Deduplication: perceptual hash -- если видео уже модерировалось, reuse результат."