Training-free sequential model merging improves multi-task learning accuracy and memory efficiency.
This paper introduces a training-free method for merging deep learning models sequentially.
-----
https://arxiv.org/abs/2501.09522
Original Problem 😫:
→ Existing model merging methods combine all models simultaneously, requiring high memory and causing task interference.
→ These methods are unsuitable for scenarios with sequentially available models.
-----
Solution in this Paper 😎:
→ The Orthogonal Projection-based Continual Merging (OPCM) method merges models sequentially.
→ It projects new model updates onto subspaces orthogonal to the existing merged model.
→ This minimizes interference between tasks.
→ An adaptive scaling mechanism maintains stable parameter distances, preserving previously learned knowledge.
-----
Key Insights from this Paper 🤔:
→ Orthogonal projections effectively minimize task interference while preserving model capabilities.
→ Adaptive scaling maintains stability and knowledge retention.
-----
Results 💯:
→ Achieves 5-8% average accuracy improvement over baseline methods on CLIP-ViT models.
→ Maintains robust performance across different task orderings.
→ Shows less negative transfer compared to baselines.
Share this post