Model Merging & Task Arithmetic¶
~2 минуты чтения
Предварительно: Дистилляция знаний LLM | Квантизация LLM
Combine fine-tuned models without training -- multi-task capabilities from weight operations
Merging Methods Comparison¶
| Method | Approach | Complexity | Best For |
|---|---|---|---|
| Simple Averaging | Mean of weights | O(1) | Similar tasks |
| Task Arithmetic | Base + task vectors | O(n) | Multi-task |
| TIES-Merging | Trim + Elect + Intersect | O(n log n) | Conflicting tasks |
| DARE | Drop + Rescale | O(n) | Large models |
| SLERP | Spherical interpolation | O(n) | 2-model merge |
Task Arithmetic¶
Concept: Task vector = fine-tuned weights - base weights.
Merging: $$ \theta_{merged} = \theta_{base} + \sum_{i=1}^{n} \lambda_i \tau_i $$
def task_arithmetic_merge(base_weights, ft_weights_list, scales):
"""Merge using task arithmetic"""
merged = base_weights.copy()
for ft_weights, scale in zip(ft_weights_list, scales):
task_vec = {k: ft_weights[k] - base_weights[k]
for k in base_weights}
for k in merged:
merged[k] += scale * task_vec[k]
return merged
TIES-Merging (Trim, Elect, Intersect)¶
Problem: Task vectors can conflict (different signs).
Solution: 1. Trim: Keep only top-k% magnitudes 2. Elect: For conflicts, keep majority sign 3. Intersect: Combine elected values
Model Soup¶
Greedy approach: Add checkpoint if improves validation.
def greedy_soup(checkpoints, val_data):
soup = [checkpoints[0]]
best_score = evaluate(average(soup), val_data)
for ckpt in checkpoints[1:]:
candidate = average(soup + [ckpt])
score = evaluate(candidate, val_data)
if score > best_score:
soup.append(ckpt)
best_score = score
return average(soup)
When to Use Which Method¶
| Scenario | Method |
|---|---|
| Same task, multiple checkpoints | Model Soup |
| Different tasks | Task Arithmetic |
| Conflicting tasks | TIES-Merging |
| 2 models | SLERP |
| Large models (7B+) | DARE |
Интервью вопросы¶
Q: Что такое task arithmetic?
A: Task vector = (fine-tuned) - (base). Представляет знания от fine-tuning. Merging: theta = theta_base + sum(lambda_i * tau_i). Позволяет комбинировать multiple tasks без training, добавлять/убирать capabilities, контролировать influence через lambda.
Q: Как TIES решает конфликты?
A: (1) Trim -- keep top-k% magnitudes (remove noise), (2) Elect -- majority sign wins, (3) Intersect -- combine elected values. Уменьшает interference между tasks.
Q: Model Soup vs Task Arithmetic?
A: Soup: checkpoints одной fine-tuning run (разные epochs). Arithmetic: models fine-tuned на разных tasks. Soup = intra-task, Arithmetic = inter-task.
Q: Ограничения model merging?
A: (1) Same architecture required, (2) Conflicts need resolution (TIES), (3) Scaling factors need tuning, (4) Diminishing returns after 2-3 models, (5) Very different tasks may not merge well.
See Also¶
- Дистилляция знаний LLM -- alternative compression
- Прунинг LLM -- pruning before merging