This paper introduces ZOQO, a method for training quantized models using zero-order optimization, reducing computational and memory needs. This is achieved by using quantized noise and a scaled learning rate, keeping all operations within a quantized format.
-----
https://arxiv.org/abs/2501.06736
Original Problem 🤔:
→ Training large models like LLMs requires high computational and memory resources.
→ Existing quantization methods often still rely on full-precision gradient calculations.
-----
Solution in this Paper 💡:
→ ZOQO combines zero-order optimization with quantized training.
→ It uses quantized noise for gradient sign estimation.
→ It adjusts the learning rate to maintain quantized parameters.
→ All operations, including updates, are performed in a quantized format.
-----
Key Insights from this Paper 👨🎓:
→ ZOQO eliminates the need for full-precision calculations in both gradients and parameter updates.
→ This reduces the computational burden and memory footprint, suitable for resource-constrained environments.
→ Despite the inherent limitations of quantized and zero-order training, ZOQO achieves competitive performance.
-----
Results ✨:
→ On black-box adversarial attacks, ZOQO's failure rates are comparable to full-precision methods on quantized models.
→ For LLM fine-tuning with LoRA on SST2, ZOQO maintains non-trivial performance even with aggressive quantization (64.34% accuracy with 4-bit quantization).
→ Memory peak usage during a single model update was 371.21 MB for ZOQO compared to 903.71 MB for full-precision and 583.83 MB for quantization-aware training.
Share this post