This paper shows simple gradient-based sparse fine-tuning outperforms complex parameter-efficient methods for adapting LLMs to new tasks.
https://arxiv.org/abs/2412.13488
🤖 Original Problem:
→ Current Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA are complex and may be unnecessarily sophisticated
→ Need for simpler, more effective approaches to adapt LLMs with minimal computational resources
-----
🔧 Solution in this Paper:
→ Introduces sparsity-based PEFT (SPEFT) that updates only selected important weight parameters
→ Uses simple gradient-based metrics to identify which parameters to update
→ Implements static masking strategy that fixes trainable parameters before training starts
→ Evaluates 8 different salience metrics including first-order and second-order methods
-----
💡 Key Insights:
→ Simple gradient-based metrics perform better than complex second-order methods
→ Static parameter selection works as well as dynamic updates
→ Global vs local sparsity ranking shows no significant differences
→ Hardware trends support sparse computation making SPEFT more practical
-----
📊 Results:
→ On RoBERTa-base MRPC: 0.98% higher accuracy than baseline
→ On GSM8k with MetaMathQA training: 22.6% better than LoRA
→ Consistent outperformance across GLUE benchmark tasks
→ Same performance with lower computational overhead
Share this post