0:00
/
0:00
Transcript

"Instruction-Following Pruning for Large Language Models"

Generated below podcast on this paper with Google's Illuminate.

LLMs can smartly pick their own parameters based on what you ask them to do

Dynamic pruning lets LLMs select task-specific parameters on-the-fly, improving efficiency while maintaining performance compared to larger models.

-----

https://arxiv.org/abs/2501.02086

🤔 Original Problem:

→ Current LLM pruning methods use fixed masks, making models less adaptable across diverse tasks like coding, math, and domain-specific requirements

-----

🔧 Solution in this Paper:

→ Introduces IFPRUNING - a method where pruning masks adapt based on user instructions

→ Uses a sparse mask predictor that takes prompts as input and selects relevant model parameters

→ Focuses on structured pruning of feed-forward neural networks, where entire rows/columns are pruned

→ Employs SoftTopK operator to transform importance scores into differentiable masks

→ Jointly optimizes both the predictor and LLM using instruction data and pre-training corpus

-----

🎯 Key Insights:

→ Parameter selection can be done per-input or per-task

→ Instructions requiring similar skills yield homogeneous pruning patterns

→ Eliminates parameter reloading costs during decoding

-----

📊 Results:

→ When pruning 9B model to 3B parameters:

- 8% improvement over dense 3B on coding tasks

- 5% better on math benchmarks

- Performance close to unpruned 9B model

------

Are you into AI and LLMs❓ Join my daily AI newsletter. I will send you 7 emails a week analyzing the highest signal AI developments. ↓↓

🎉 https://rohanpaul.substack.com/