0:00
/
0:00
Transcript

"How to Merge Your Multimodal Models Over Time?"

Generated below podcast on this paper with Google's Illuminate.

TIME framework makes model merging work in real-world sequential scenarios.

This paper introduces TIME (Temporal Integration of Model Expertise), a framework for merging multiple expert models sequentially over time instead of all at once.

-----

https://arxiv.org/abs/2412.06712

🤔 Original Problem:

→ Current model merging methods assume all expert models are available simultaneously, but in reality, new tasks emerge progressively over time, requiring continuous model adaptation.

-----

🔧 Solution in this Paper:

→ TIME framework defines temporal model merging across three key axes: Initialization Phase (how to select starting weights), Deployment Phase (how to produce final model), and Merging Technique (how to combine weights).

→ The framework introduces "Best-in-TIME" strategy that uses exponential moving average (EMA) for both initialization and deployment phases.

→ For each new task, the system initializes from previous merged weights, trains on current task, and deploys using EMA-based merging.

-----

💡 Key Insights:

→ Standard offline merging techniques perform poorly in temporal settings

→ Complex merging techniques provide minimal benefits over simple weighted averaging

→ Initialization and deployment strategies matter more than specific merging techniques

→ Temporal merging scales well with larger models and compute budgets

-----

📊 Results:

→ Best-in-TIME outperforms standard sequential fine-tuning by 15% on knowledge retention

→ Scales effectively from 62.3M to 1.37B parameters

→ Matches or exceeds multitask training performance at higher compute budgets

Discussion about this video

User's avatar