PEFT methods work surprisingly better on Mamba than on Transformers.
Mamba's sequential nature enables more efficient parameter tuning than Transformers
Hybrid PEFT approach makes Mamba models 24x more parameter-efficient
https://arxiv.org/abs/2411.03855
Original Problem 🤔:
Parameter-Efficient Fine-Tuning (PEFT) methods are crucial for adapting pre-trained models to downstream tasks. While PEFT has been extensively studied for Transformers, its application to Mamba (a State Space Model-based architecture) remains unexplored, limiting the efficient adaptation of pre-trained Mamba models.
-----
Solution in this Paper 🛠️:
→ Adapted existing Transformer PEFT methods like ParallelAdapter and LoRA to Mamba's architecture
→ Developed Mamba-specific PEFT methods including Partial LoRA, Affix-tuning, and Additional-scan
→ Created a two-step hybrid approach to combine multiple PEFT methods optimally
→ Introduced a framework to search for the best combination of PEFT methods using TPE algorithm
-----
Key Insights 💡:
→ PEFT performs more effectively for Mamba than for Transformers
→ Position of prompt tokens significantly impacts performance due to Mamba's sequential nature
→ Combining multiple PEFT methods outperforms individual methods
→ The hybrid approach achieves better results with fewer parameters
-----
Results 📊:
→ On VTAB-1k benchmark, Mamba with hybrid PEFT achieves 72.05% average accuracy
→ Outperforms individual PEFT methods by 1-2% across Natural, Specialized, and Structured tasks
→ Requires only 1,044K parameters compared to full fine-tuning's 25,450K parameters
Share this post