"No Task Left Behind: Isotropic Model Merging with Common and Task-Specific Subspaces"

Below podcast on this paper is generated with Google's Illuminate.

Rohan Paul

Feb 15, 2025

Article voiceover

0:00

-6:16

https://arxiv.org/abs/2502.04959

The problem is that merging multiple task-specific models into one multi-task model still results in a performance drop compared to individual task-specific models. This paper introduces a new merging method to reduce this performance gap by analyzing the properties of task matrices.

This paper proposes "Isotropic Merging". It flattens the singular value spectrum of merged task matrices to improve alignment between task-specific and merged subspaces. This approach enhances performance without additional training.

-----

Okay, here are my technical perspectives on the paper's solution:

📌 Isotropic Merging cleverly uses Singular Value Decomposition to flatten the singular value spectrum. This balances task representation in merged models, boosting multi-task performance without extra training.

📌 The paper shifts focus from cosine similarity to subspace alignment for effective merging. By aligning task and merged subspaces, Isotropic Merging achieves superior performance compared to Task Arithmetic.

📌 Iso-CTS method practically combines common and task-specific subspaces. This simple yet effective approach reaches state-of-the-art model merging results across diverse tasks and model scales.

----------

Methods Explored in this Paper 🔧:

→ This paper explores model merging through the lens of Singular Value Decomposition of task matrices. Task matrix is the weight update matrix applied to a pre-trained model for a specific task.

→ Introduces Subspace Alignment Ratio metric. It quantifies the similarity between subspaces of task-specific matrices and merged matrices.

→ Isotropic Merging in Common Subspace (Iso-C) is proposed. Iso-C flattens the singular value spectrum of the merged task matrix to make it uniform. This is done by scaling all singular directions to the average singular value.

→ Isotropic Merging in Common and Task-Specific Subspaces (Iso-CTS) is also proposed. Iso-CTS enhances Iso-C by incorporating task-specific directions. It retains top singular vectors from a common subspace and adds task-specific directions orthogonal to the common subspace.

-----

Key Insights 💡:

→ Alignment between singular components of task-specific and merged matrices strongly correlates with performance improvement.

→ Flattening the singular value spectrum of merged matrices improves subspace alignment. This leads to better multi-task performance.

→ Incorporating task-specific subspaces along with a common subspace further enhances merging performance, especially with a larger number of tasks.

-----

Results 📊:

→ Iso-CTS achieves state-of-the-art performance across various tasks and model scales (ViT-B/32, ViT-B/16, ViT-L/14).

→ Iso-CTS outperforms Task Arithmetic by a large margin, especially when merging 14 and 20 tasks, with up to 2.8% absolute accuracy improvement.

→ Iso-C and Iso-CTS are more robust to the choice of scaling factor α compared to Task Arithmetic.

Rohan's Bytes

Discussion about this post