0:00
/
0:00
Transcript

"FlexiGPT: Pruning and Extending Large Language Models with Low-Rank Weight Sharing"

Below podcast is generated with Google's Illuminate.

FlexiGPT: Scale your LLM without the parameter bloat using low-rank magic. Prune and extend LLMs effortlessly

The paper introduces FlexiGPT to address the challenge of efficiently adapting Large Language Models. It proposes a method to prune and extend LLMs using low-rank weight sharing, maintaining performance while reducing parameters.

-----

Paper - https://arxiv.org/abs/2501.14713

Original Problem: 🧐:

→ Large Language Models are computationally expensive.

→ Deploying and adapting LLMs for specific tasks or resource constraints is challenging.

-----

Solution in this Paper: 💡:

→ FlexiGPT is introduced. It prunes and extends LLMs via low-rank weight sharing.

→ Singular Value Decomposition is used to decompose weight matrices into low-rank components.

→ Shared low-rank bases are maintained. Task-specific adapters are added.

→ During pruning, less important singular vectors are removed. During extension, new vectors are added to the shared bases.

→ This allows parameter-efficient adaptation and scaling of LLMs.

-----

Key Insights from this Paper: 🧠:

→ Low-rank weight sharing effectively reduces redundancy in LLM parameters.

→ Task-specific adapters enable efficient fine-tuning without modifying core model weights.

→ Pruning and extension can be achieved within a unified framework using this approach.

-----

Results: 📊:

→ FlexiGPT achieves comparable performance to full fine-tuning with significantly fewer parameters.

→ On GLUE benchmark, pruned FlexiGPT with 10% parameters achieves 97% of the performance of a fully fine-tuned model.

→ Extended FlexiGPT shows strong performance in continual learning scenarios, mitigating catastrophic forgetting.

Discussion about this video