Model that doesn't get memory loss while learning.
Online-LoRA enables continuous model adaptation without task boundaries while using minimal memory.
A smart way to make Vision Transformers learn continuously without forgetting previous knowledge
https://arxiv.org/abs/2411.05663
🎯 Original Problem:
Catastrophic forgetting in online continual learning scenarios where data streams lack clear task boundaries, especially for nonstationary data streams in real-time applications with memory and privacy constraints.
-----
🛠️ Solution in this Paper:
→ Online-LoRA introduces a novel framework that finetunes preained Vision Transformer models in real-time
→ Uses loss plateaus to automatically detect data distribution shifts and trigger model expansion with new LoRA parameters
→ Implements an online weight regularization strategy focusing only on LoRA parameters instead of entire model parameters
→ Previous LoRA parameters get frozen and merged into pre-trained ViT model weights when new parameters are added
→ Employs a minimal hard buffer (4 samples) for parameter importance estimation
-----
💡 Key Insights:
→ Loss surface plateaus effectively indicate data distribution shifts in continuous learning
→ Focusing regularization on LoRA parameters reducesmory overhead to ~0.17% of total model parameters
→ Automatic detection of distribution shifts eliminates need for explicit task boundaries
→ Hard buffer with highest-loss samples improves parameter importance estimation
-----
📊 Results:
→ Outperforms SOTA methods across CIFAR-100, ImageNet-R, ImageNet-S, CUB-200 and CORe50 benchmarks
→ Shows robust performance across ViT architectures (ViT-B/16 and ViT-S/16)
→ Achieves 49.40% accuracy on Split-CIFAR-100 compared to SOTA's 48.48%
Share this post