0:00
/
0:00
Transcript

"MambaPEFT: Exploring Parameter-Efficient Fine-Tuning for Mamba"

The podcast on this paper is generated with Google's Illuminate.

PEFT methods work surprisingly better on Mamba than on Transformers.

Mamba's sequential nature enables more efficient parameter tuning than Transformers

Hybrid PEFT approach makes Mamba models 24x more parameter-efficient

https://arxiv.org/abs/2411.03855

Original Problem 🤔:

Parameter-Efficient Fine-Tuning (PEFT) methods are crucial for adapting pre-trained models to downstream tasks. While PEFT has been extensively studied for Transformers, its application to Mamba (a State Space Model-based architecture) remains unexplored, limiting the efficient adaptation of pre-trained Mamba models.

-----

Solution in this Paper 🛠️:

→ Adapted existing Transformer PEFT methods like ParallelAdapter and LoRA to Mamba's architecture

→ Developed Mamba-specific PEFT methods including Partial LoRA, Affix-tuning, and Additional-scan

→ Created a two-step hybrid approach to combine multiple PEFT methods optimally

→ Introduced a framework to search for the best combination of PEFT methods using TPE algorithm

-----

Key Insights 💡:

→ PEFT performs more effectively for Mamba than for Transformers

→ Position of prompt tokens significantly impacts performance due to Mamba's sequential nature

→ Combining multiple PEFT methods outperforms individual methods

→ The hybrid approach achieves better results with fewer parameters

-----

Results 📊:

→ On VTAB-1k benchmark, Mamba with hybrid PEFT achieves 72.05% average accuracy

→ Outperforms individual PEFT methods by 1-2% across Natural, Specialized, and Structured tasks

→ Requires only 1,044K parameters compared to full fine-tuning's 25,450K parameters

Discussion about this video