A LoRA optimizer that doesn't care how you split your matrices
LoRA-RITE, proposed in this paper, makes fine-tuning updates consistent no matter how you scale the matrices
This paper introduces LoRA-RITE, a novel optimizer for LoRA fine-tuning that achieves transformation invariance while being computationally efficient. It addresses the key limitation of current LoRA optimizers which produce inconsistent updates depending on how LoRA factors are scaled or rotated, leading to inefficient learning.
-----
https://arxiv.org/abs/2410.20625v1
🤔 Original Problem:
Current LoRA optimizers lack transformation invariance - meaning their weight updates depend on how the two LoRA factors are scaled/rotated. This leads to inefficient learning where one factor dominates the updates while the other remains static.
-----
🛠️ Solution in this Paper:
→ LoRA-RITE employs a transformation-invariant preconditioner specifically designed for LoRA optimization
→ It achieves invariance by using unmagnified gradients that depend only on the column spaces of LoRA factors
→ The method incorporates first and second moments crucial for adaptive optimization while maintaining transformation invariance
→ It uses matrix preconditioners but keeps computational overhead low by applying preconditioning only on the shorter dimension
-----
💡 Key Insights:
→ Diagonal preconditioning cannot achieve transformation invariance for LoRA optimization
→ Using unmagnified gradients is key to maintaining invariance across different LoRA parameterizations
→ Matrix preconditioning on the shorter dimension provides a good balance between effectiveness and efficiency
-----
📊 Results:
→ 4.6% accuracy gain on Super-Natural Instructions when replacing Adam with LoRA-RITE for Gemma-2B
→ 3.5% average accuracy improvement across HellaSwag, ArcChallenge, GSM8K, OpenBookQA
→ 55.5% accuracy on GSM8K with Gemma-7B, surpassing Adam (48.37%) and Lamb (50.64%)
Share this post