"Not All Adapters Matter: Selective Adapter Freezing for Memory-Efficient Fine-Tuning of Language Models"

Playback speed

Share post at current time

0:00

Transcript

"Not All Adapters Matter: Selective Adapter Freezing for Memory-Efficient Fine-Tuning of Language Models"

The podcast on this paper is generated with Google's Illuminate.

Rohan Paul

Dec 31, 2024

Not all adapters matter - freezing the right ones saves massive GPU memory.

This paper introduces SAFE (Selective Adapter FrEezing), a method that reduces memory usage in adapter-based fine-tuning by identifying and freezing less important adapters early in training, achieving significant memory savings while maintaining or improving model performance.

-----

https://arxiv.org/abs/2412.03587

🔍 Original Problem:

→ Adapter-based fine-tuning, while parameter-efficient, still requires high memory usage due to activation memory during backpropagation

→ Current adapter methods reduce trainable parameters by 99.37% but only decrease memory usage by 22.19%, as activation memory accounts for 76% of total memory usage

-----

🛠️ Solution in this Paper:

→ SAFE monitors adapter importance during a warm-up phase using Centered Kernel Alignment (CKA) to measure feature representation changes

→ It gradually freezes less important adapters based on a moving threshold following a cubic schedule

→ The method induces regularization effects by controlling weight norms, leading to flatter loss landscapes and better generalization

-----

💡 Key Insights:

→ Not all adapters contribute equally to model adaptation - some finish contributing early in training

→ Adapters closer to input layers generally learn basic features and can be frozen earlier

→ Early freezing creates regularization effects that improve model robustness

-----

📊 Results:

→ Reduces memory usage by 42.85%, computation by 34.59%, and training time by 11.82%

→ Achieves comparable or better performance than baseline methods

→ On RoBERTa-large, reduces memory by 79.92% while improving F1 score from 93.39 to 94.13

Rohan's Bytes

"Not All Adapters Matter: Selective Adapter Freezing for Memory-Efficient Fine-Tuning of Language Models"

Discussion about this video