0:00
/
0:00
Transcript

"ZOQO: Zero-Order Quantized Optimization"

Generated below podcast on this paper with Google's Illuminate.

This paper introduces ZOQO, a method for training quantized models using zero-order optimization, reducing computational and memory needs. This is achieved by using quantized noise and a scaled learning rate, keeping all operations within a quantized format.

-----

https://arxiv.org/abs/2501.06736

Original Problem 🤔:

→ Training large models like LLMs requires high computational and memory resources.

→ Existing quantization methods often still rely on full-precision gradient calculations.

-----

Solution in this Paper 💡:

→ ZOQO combines zero-order optimization with quantized training.

→ It uses quantized noise for gradient sign estimation.

→ It adjusts the learning rate to maintain quantized parameters.

→ All operations, including updates, are performed in a quantized format.

-----

Key Insights from this Paper 👨‍🎓:

→ ZOQO eliminates the need for full-precision calculations in both gradients and parameter updates.

→ This reduces the computational burden and memory footprint, suitable for resource-constrained environments.

→ Despite the inherent limitations of quantized and zero-order training, ZOQO achieves competitive performance.

-----

Results ✨:

→ On black-box adversarial attacks, ZOQO's failure rates are comparable to full-precision methods on quantized models.

→ For LLM fine-tuning with LoRA on SST2, ZOQO maintains non-trivial performance even with aggressive quantization (64.34% accuracy with 4-bit quantization).

→ Memory peak usage during a single model update was 371.21 MB for ZOQO compared to 903.71 MB for full-precision and 583.83 MB for quantization-aware training.

Discussion about this video

User's avatar