TIME framework makes model merging work in real-world sequential scenarios.
This paper introduces TIME (Temporal Integration of Model Expertise), a framework for merging multiple expert models sequentially over time instead of all at once.
-----
https://arxiv.org/abs/2412.06712
🤔 Original Problem:
→ Current model merging methods assume all expert models are available simultaneously, but in reality, new tasks emerge progressively over time, requiring continuous model adaptation.
-----
🔧 Solution in this Paper:
→ TIME framework defines temporal model merging across three key axes: Initialization Phase (how to select starting weights), Deployment Phase (how to produce final model), and Merging Technique (how to combine weights).
→ The framework introduces "Best-in-TIME" strategy that uses exponential moving average (EMA) for both initialization and deployment phases.
→ For each new task, the system initializes from previous merged weights, trains on current task, and deploys using EMA-based merging.
-----
💡 Key Insights:
→ Standard offline merging techniques perform poorly in temporal settings
→ Complex merging techniques provide minimal benefits over simple weighted averaging
→ Initialization and deployment strategies matter more than specific merging techniques
→ Temporal merging scales well with larger models and compute budgets
-----
📊 Results:
→ Best-in-TIME outperforms standard sequential fine-tuning by 15% on knowledge retention
→ Scales effectively from 62.3M to 1.37B parameters
→ Matches or exceeds multitask training performance at higher compute budgets
Share this post