Forward passes alone can prevent AI models from forgetting - no backpropagation needed.
ZeroFlow introduces a method to prevent AI models from forgetting old knowledge when learning new tasks, using only forward passes without backpropagation.
https://arxiv.org/abs/2501.01045
🤖 Original Problem:
→ Current AI systems require gradient information through backpropagation to prevent catastrophic forgetting, but this isn't always possible with black-box APIs, hardware limitations, or non-differentiable systems.
📝 Solution in this Paper:
→ ZeroFlow uses zeroth-order optimization methods that only need forward passes to estimate gradients.
→ It employs symmetric perturbation pairs to efficiently approximate gradient direction without accessing internal model parameters.
→ The method introduces historical gradient reweighting to stabilize learning across tasks.
→ It implements sparsity-induced estimation to reduce variance in gradient updates.
💡 Key Insights:
→ Forward passes alone can effectively prevent catastrophic forgetting
→ ZO methods reduce memory usage by 5x compared to traditional approaches
→ Query numbers significantly impact optimization performance
→ Sparsity and historical gradients help stabilize learning
📊 Results:
→ Achieves comparable or better performance than backpropagation methods across multiple datasets
→ Reduces memory cost from 12.08GB to 2.41GB
→ Decreases runtime by 50% compared to traditional methods
→ Maintains stable performance across different sparsity ratios (10-90%)
------
Are you into AI and LLMs❓ Join my daily AI newsletter. I will send you 7 emails a week analyzing the highest signal AI developments. ↓↓
🎉 https://rohanpaul.substack.com/
Share this post