LLMs can smartly pick their own parameters based on what you ask them to do
Dynamic pruning lets LLMs select task-specific parameters on-the-fly, improving efficiency while maintaining performance compared to larger models.
-----
https://arxiv.org/abs/2501.02086
🤔 Original Problem:
→ Current LLM pruning methods use fixed masks, making models less adaptable across diverse tasks like coding, math, and domain-specific requirements
-----
🔧 Solution in this Paper:
→ Introduces IFPRUNING - a method where pruning masks adapt based on user instructions
→ Uses a sparse mask predictor that takes prompts as input and selects relevant model parameters
→ Focuses on structured pruning of feed-forward neural networks, where entire rows/columns are pruned
→ Employs SoftTopK operator to transform importance scores into differentiable masks
→ Jointly optimizes both the predictor and LLM using instruction data and pre-training corpus
-----
🎯 Key Insights:
→ Parameter selection can be done per-input or per-task
→ Instructions requiring similar skills yield homogeneous pruning patterns
→ Eliminates parameter reloading costs during decoding
-----
📊 Results:
→ When pruning 9B model to 3B parameters:
- 8% improvement over dense 3B on coding tasks
- 5% better on math benchmarks
- Performance close to unpruned 9B model
------
Are you into AI and LLMs❓ Join my daily AI newsletter. I will send you 7 emails a week analyzing the highest signal AI developments. ↓↓
🎉 https://rohanpaul.substack.com/
Share this post