"How to Merge Your Multimodal Models Over Time?"

Playback speed

Share post at current time

0:00

Transcript

"How to Merge Your Multimodal Models Over Time?"

Generated below podcast on this paper with Google's Illuminate.

Rohan Paul

Jan 06, 2025

TIME framework makes model merging work in real-world sequential scenarios.

This paper introduces TIME (Temporal Integration of Model Expertise), a framework for merging multiple expert models sequentially over time instead of all at once.

-----

https://arxiv.org/abs/2412.06712

🤔 Original Problem:

→ Current model merging methods assume all expert models are available simultaneously, but in reality, new tasks emerge progressively over time, requiring continuous model adaptation.

-----

🔧 Solution in this Paper:

→ TIME framework defines temporal model merging across three key axes: Initialization Phase (how to select starting weights), Deployment Phase (how to produce final model), and Merging Technique (how to combine weights).

→ The framework introduces "Best-in-TIME" strategy that uses exponential moving average (EMA) for both initialization and deployment phases.

→ For each new task, the system initializes from previous merged weights, trains on current task, and deploys using EMA-based merging.

-----

💡 Key Insights:

→ Standard offline merging techniques perform poorly in temporal settings

→ Complex merging techniques provide minimal benefits over simple weighted averaging

→ Initialization and deployment strategies matter more than specific merging techniques

→ Temporal merging scales well with larger models and compute budgets

-----

📊 Results:

→ Best-in-TIME outperforms standard sequential fine-tuning by 15% on knowledge retention

→ Scales effectively from 62.3M to 1.37B parameters

→ Matches or exceeds multitask training performance at higher compute budgets

Rohan's Bytes

"How to Merge Your Multimodal Models Over Time?"

Discussion about this video