Метрики системы модерации контента¶

~3 минуты чтения

Предварительно: Определение задачи, Компоненты

Facebook удаляет 26M+ единиц контента в квартал за hate speech, при этом 97% удалений происходит ДО первого пользовательского report. Ключевая сложность: precision 95% при 1B posts/day = 50M ошибочных действий в день. Каждый false positive -- цензура, каждый false negative -- потенциальный вред пользователям. Регуляторы (EU DSA) требуют удаление в течение 24 часов с прозрачной отчётностью.

Иерархия метрик¶

Уровень	Метрики	Цели
Regulatory	Removal time, transparency reports, appeal success rate	DSA compliance
Business	User trust score, DAU impact, advertiser safety	Brand safety > 99%
Operational	Review queue, moderator throughput, appeal volume	Queue < 10K
Model	Precision, Recall per policy, multi-modal accuracy	Recall > 95% violence
System	Latency, throughput, availability	p99 < 200ms

Business Metrics¶

User Impact¶

Метрика	Формула	Цель
Content actioned proactively	auto_removed / total_removed	> 95%
User appeal rate	appeals / total_actioned	< 5%
Appeal overturn rate	overturned / total_appeals	< 10%
Harmful content exposure	views_before_removal * severity	Minimize
User trust score	1 - (complaints + churned_users) / DAU	> 98%

Advertiser Safety¶

def compute_brand_safety_metrics(
    ad_impressions: int,
    adjacent_violations: int,
    advertiser_complaints: int,
) -> dict:
    return {
        "brand_safe_rate": 1 - adjacent_violations / ad_impressions,  # > 99%
        "advertiser_complaint_rate": advertiser_complaints / ad_impressions,
        "revenue_at_risk": advertiser_complaints * avg_advertiser_spend,
    }

Model Metrics¶

Per-Policy Metrics¶

Тип нарушения	Precision target	Recall target	Почему
CSAM	> 99%	> 99.9%	Zero tolerance, legal requirement
Violence/gore	> 95%	> 95%	High harm, high urgency
Hate speech	> 90%	> 90%	Context-dependent, harder
Nudity/NSFW	> 95%	> 95%	Clear visual signals
Spam	> 95%	> 90%	Less harmful, volume-heavy
Misinformation	> 85%	> 80%	Most subjective, hardest

def compute_multimodal_metrics(predictions, ground_truth, modalities):
    """Отдельные метрики для каждой модальности"""
    results = {}
    for modality in ["text", "image", "video", "audio"]:
        mask = modalities == modality
        if mask.sum() == 0:
            continue

        results[modality] = {
            "precision": precision_score(ground_truth[mask], predictions[mask]),
            "recall": recall_score(ground_truth[mask], predictions[mask]),
            "volume_share": mask.sum() / len(predictions),
        }

    # Cross-modal: text says OK but image violates
    cross_modal_misses = compute_cross_modal_errors(predictions, ground_truth)
    results["cross_modal_miss_rate"] = cross_modal_misses

    return results

Fairness Metrics¶

Метрика	Описание	Цель
FPR parity across languages	FP rate не должен зависеть от языка	< 2x разница
FPR parity across demographics	Нет bias по расе/полу/религии	< 1.5x разница
Appeal overturn parity	Одинаковый % успешных апелляций	< 2x разница

Operational Metrics¶

Moderator Performance¶

Метрика	Цель
Cases reviewed per hour	> 100 (text), > 50 (image), > 20 (video)
Agreement with ML prediction	> 85%
Inter-rater reliability (Cohen's kappa)	> 0.7
Moderator burnout indicators	Accuracy drop < 5% over shift
Secondary trauma screening	Monthly, mandatory

Queue Health¶

Queue targets:
  CSAM:           < 1 hour to review (legal)
  Violence:       < 4 hours
  Hate speech:    < 24 hours (DSA)
  Spam:           < 48 hours
  Misinformation: < 72 hours

System Metrics¶

Метрика	Цель
Inference latency p99	< 200ms (text), < 500ms (image), < 2s (video)
Throughput	50K posts/sec
Availability	99.9%
Time-to-action (auto)	< 5 min from upload
Time-to-action (manual)	< 24h (DSA compliance)

Заблуждение: одна метрика precision/recall для всех типов нарушений

CSAM требует recall > 99.9% (zero tolerance, legal liability). Hate speech -- precision > 90% (context-dependent, высокий risk цензуры). Spam -- recall > 90% достаточно (low harm). Единая метрика маскирует критические провалы: модель с avg precision 95% может иметь 70% precision на hate speech (30% ложных обвинений) и 99.9% на CSAM.

Заблуждение: автоматическая модерация может заменить людей

Даже лучшие модели имеют 5-15% ошибок на субъективных категориях (hate speech, misinformation). Human-in-the-loop обязателен для: (1) borderline cases (скор 0.4-0.6), (2) appeals, (3) new violation types, (4) cultural context. Facebook employs 15,000+ moderators despite having best-in-class ML. Но: moderator burnout -- реальная проблема, exposure к CSAM/violence требует wellness программ.

Секция для интервью¶

Вопрос: "Как измерить качество системы модерации?"

Слабый ответ: "Precision и recall модели."

Сильный ответ: "Пять уровней метрик. Regulatory: DSA compliance (removal < 24h), transparency report completeness. Business: proactive action rate > 95% (удаляем ДО report), brand safety > 99% (рекламодатели не рядом с violations), appeal overturn rate < 10%. Model: per-policy targets (CSAM recall > 99.9%, hate speech precision > 90%), multi-modal accuracy, fairness -- FPR parity across languages и demographics. Operational: moderator throughput, inter-rater reliability (kappa > 0.7), queue SLA по severity. System: p99 < 200ms text, < 2s video. Критически: harmful content exposure = views x severity -- минимизировать время между upload и action."