LoRS (Low-Rank Adaptation for Sparse Large Language Model) paper introduces a method to efficiently fine-tune sparse Large Language Models, reducing memory and computation demands while preserving model sparsity and improving performance.
-----
https://arxiv.org/abs/2501.08582v1
Original Problem: 😟
→ Existing Low-Rank Adaptation methods face challenges when applied to sparse Large Language Models because they fail to maintain sparsity.
→ This leads to increased memory usage and computational overhead.
→ Sparsity Preserved Low-Rank Adaptation methods like Sparsity Preserved Parameter-efficient Fine-tuning and Structured Sparsity Quantization-aware Fine-tuning attempt to solve this, but they still suffer from high memory and time overheads, undermining the efficiency of Low-Rank Adaptation.
-----
Solution in this Paper: 🤓
→ The paper proposes Low Rank Adaptation method for Sparse Large Language Model, a novel fine-tuning method that preserves sparsity while minimizing computation and memory overhead.
→ Low Rank Adaptation method for Sparse Large Language Model uses weight recompute and computational graph rearrangement techniques.
→ The method discards fitness weights during each forward pass and recalculates them during backward passes, reducing memory overhead.
→ Computational graph rearrangement in the backward pass optimizes gradient computation, further reducing computational overhead.
→ The method includes better adapter initialization for improved effectiveness.
-----
Key Insights from this Paper: 🤔
→ Maintaining sparsity in Large Language Models during Low-Rank Adaptation is crucial for efficiency.
→ Weight recompute and computational graph rearrangement can significantly reduce memory and computational overhead in fine-tuning sparse Large Language Models.
→ Better adapter initialization enhances the performance of Low-Rank Adaptation methods on sparse Large Language Models.
→ Adapters can be efficiently incorporated into all sparse weight matrices within the models.
→ There are two primary metrics to assess Low-Rank Adaptation and Sparsity Preserved Low-Rank Adaptation methods: efficiency (memory consumption and computational time) and performance (accuracy across downstream tasks).
-----
Results: 💯
→ Low Rank Adaptation method for Sparse Large Language Model outperforms existing Sparsity Preserved Low-Rank Adaptation methods in terms of performance, memory usage, and computation efficiency.
→ Achieves a 7 to 25 percent performance improvement compared to models obtained through post-training pruning.
→ Has a 1 to 2 percent performance improvement over Sparsity Preserved Parameter-efficient Fine-tuning and Structured Sparsity Quantization-aware Fine-tuning on the Alpaca dataset.
Share this post