Original Problem 🔍:
Aggregating LoRA matrices A and B in federated learning introduces errors, as the aggregated update differs from client-specific updates. Directly combining both matrices on the server and broadcasting them to clients leads to suboptimal performance.
-----
📚 https://arxiv.org/abs/2410.01463
Solution in this Paper 🛠️:
• Introduces Federated Share-A Low-Rank Adaptation (FedSA-LoRA)
• Uses two low-rank trainable matrices A and B for weight updates
• Only A matrices shared with server for aggregation
• B matrices kept locally to preserve client-specific knowledge
• Extends approach to other LoRA variants: FedSA-rsLoRA and FedSA-VeRA
-----
Key Insights from this Paper 💡:
• A matrices learn general knowledge, B matrices capture client-specific knowledge
• Sharing only A matrices enhances learning abilities of LoRA in federated settings
• Approach generalizes across different LoRA variants
• Effective in non-IID scenarios and scales well with increasing client numbers
-----
Results 📊:
• FedSA-LoRA outperforms baselines across GLUE benchmark tasks
• Improves accuracy by 1.84% on QNLI and 1.4% on MNLI-m in severe non-IID scenarios
• Demonstrates superior performance with 10 to 100 clients
• Achieves 46.63% accuracy on GSM8K dataset, surpassing LoRA (46.24%) and FFA-LoRA (46.32%)
FedSA-LoRA enhances federated fine-tuning of LLMs by selectively aggregating LoRA matrices for improved performance.
Share this post