Not all adapters matter - freezing the right ones saves massive GPU memory.
This paper introduces SAFE (Selective Adapter FrEezing), a method that reduces memory usage in adapter-based fine-tuning by identifying and freezing less important adapters early in training, achieving significant memory savings while maintaining or improving model performance.
-----
https://arxiv.org/abs/2412.03587
🔍 Original Problem:
→ Adapter-based fine-tuning, while parameter-efficient, still requires high memory usage due to activation memory during backpropagation
→ Current adapter methods reduce trainable parameters by 99.37% but only decrease memory usage by 22.19%, as activation memory accounts for 76% of total memory usage
-----
🛠️ Solution in this Paper:
→ SAFE monitors adapter importance during a warm-up phase using Centered Kernel Alignment (CKA) to measure feature representation changes
→ It gradually freezes less important adapters based on a moving threshold following a cubic schedule
→ The method induces regularization effects by controlling weight norms, leading to flatter loss landscapes and better generalization
-----
💡 Key Insights:
→ Not all adapters contribute equally to model adaptation - some finish contributing early in training
→ Adapters closer to input layers generally learn basic features and can be frozen earlier
→ Early freezing creates regularization effects that improve model robustness
-----
📊 Results:
→ Reduces memory usage by 42.85%, computation by 34.59%, and training time by 11.82%
→ Achieves comparable or better performance than baseline methods
→ On RoBERTa-large, reduces memory by 79.92% while improving F1 score from 93.39 to 94.13
Share this post