How model quantization affects mathematical reasoning abilities in LLMs and proposes solutions to minimize performance loss.
Mathematical reasoning survives aggressive model compression with smart quantization
-----
https://arxiv.org/abs/2501.03035
🤔 Original Problem:
LLMs excel at complex mathematical reasoning but require massive computational resources. While quantization reduces resource needs, its impact on mathematical reasoning capabilities remains unclear.
-----
🔧 Solution in this Paper:
→ Introduced a multidimensional evaluation framework to assess quantization's impact on mathematical reasoning capabilities
→ Implemented format alignment and knowledge infilling using LoRA for fine-tuning quantized models
→ Used PRM800K dataset with 8,000 problem-solution pairs to train and evaluate models
→ Applied different quantization techniques: GPTQ, AWQ (4-bit weights), and SmoothQuant (8-bit weights/activations)
-----
💡 Key Insights:
→ Quantization significantly impacts step-by-step reasoning abilities
→ Computation errors and step omissions are major failure modes
→ Low-bit precision causes overflow issues in multi-step calculations
→ SmoothQuant (W8A8) shows better robustness than AWQ/GPTQ
-----
📊 Results:
→ SmoothQuant (W8A8) showed smallest performance drop of 0.84 points
→ AWQ (W4A16) degraded performance by 2.58 points
→ Fine-tuning improved scores across all quantization methods
Share this post