"Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging"

Playback speed

Share post at current time

Share from 0:00

0:00

Transcript

Generated below podcast on this paper with Google's Illuminate.

Jan 25, 2025

Training-free sequential model merging improves multi-task learning accuracy and memory efficiency.

This paper introduces a training-free method for merging deep learning models sequentially.

-----

Original Problem 😫:

→ Existing model merging methods combine all models simultaneously, requiring high memory and causing task interference.

→ These methods are unsuitable for scenarios with sequentially available models.

-----

Solution in this Paper 😎:

→ The Orthogonal Projection-based Continual Merging (OPCM) method merges models sequentially.

→ It projects new model updates onto subspaces orthogonal to the existing merged model.

→ This minimizes interference between tasks.

→ An adaptive scaling mechanism maintains stable parameter distances, preserving previously learned knowledge.

-----

Key Insights from this Paper 🤔:

→ Orthogonal projections effectively minimize task interference while preserving model capabilities.

→ Adaptive scaling maintains stability and knowledge retention.

-----

Results 💯:

→ Achieves 5-8% average accuracy improvement over baseline methods on CLIP-ViT models.

→ Maintains robust performance across different task orderings.

→ Shows less negative transfer compared to baselines.

Rohan's Bytes