Перейти к содержанию

Model Merging & Task Arithmetic

~2 минуты чтения

Предварительно: Дистилляция знаний LLM | Квантизация LLM

Combine fine-tuned models without training -- multi-task capabilities from weight operations


Merging Methods Comparison

Method Approach Complexity Best For
Simple Averaging Mean of weights O(1) Similar tasks
Task Arithmetic Base + task vectors O(n) Multi-task
TIES-Merging Trim + Elect + Intersect O(n log n) Conflicting tasks
DARE Drop + Rescale O(n) Large models
SLERP Spherical interpolation O(n) 2-model merge

Task Arithmetic

Concept: Task vector = fine-tuned weights - base weights.

\[ \tau_i = \theta_i^{ft} - \theta_{base} \]

Merging: $$ \theta_{merged} = \theta_{base} + \sum_{i=1}^{n} \lambda_i \tau_i $$

def task_arithmetic_merge(base_weights, ft_weights_list, scales):
    """Merge using task arithmetic"""
    merged = base_weights.copy()

    for ft_weights, scale in zip(ft_weights_list, scales):
        task_vec = {k: ft_weights[k] - base_weights[k]
                    for k in base_weights}
        for k in merged:
            merged[k] += scale * task_vec[k]

    return merged

TIES-Merging (Trim, Elect, Intersect)

Problem: Task vectors can conflict (different signs).

Solution: 1. Trim: Keep only top-k% magnitudes 2. Elect: For conflicts, keep majority sign 3. Intersect: Combine elected values

Model Soup

Greedy approach: Add checkpoint if improves validation.

def greedy_soup(checkpoints, val_data):
    soup = [checkpoints[0]]
    best_score = evaluate(average(soup), val_data)

    for ckpt in checkpoints[1:]:
        candidate = average(soup + [ckpt])
        score = evaluate(candidate, val_data)
        if score > best_score:
            soup.append(ckpt)
            best_score = score

    return average(soup)

When to Use Which Method

Scenario Method
Same task, multiple checkpoints Model Soup
Different tasks Task Arithmetic
Conflicting tasks TIES-Merging
2 models SLERP
Large models (7B+) DARE

Интервью вопросы

Q: Что такое task arithmetic?

A: Task vector = (fine-tuned) - (base). Представляет знания от fine-tuning. Merging: theta = theta_base + sum(lambda_i * tau_i). Позволяет комбинировать multiple tasks без training, добавлять/убирать capabilities, контролировать influence через lambda.

Q: Как TIES решает конфликты?

A: (1) Trim -- keep top-k% magnitudes (remove noise), (2) Elect -- majority sign wins, (3) Intersect -- combine elected values. Уменьшает interference между tasks.

Q: Model Soup vs Task Arithmetic?

A: Soup: checkpoints одной fine-tuning run (разные epochs). Arithmetic: models fine-tuned на разных tasks. Soup = intra-task, Arithmetic = inter-task.

Q: Ограничения model merging?

A: (1) Same architecture required, (2) Conflicts need resolution (TIES), (3) Scaling factors need tuning, (4) Diminishing returns after 2-3 models, (5) Very different tasks may not merge well.


See Also