NPFT (Noise Perturbation Fine-tuning ) makes LLM weights less sensitive to quantization by teaching them to handle noise.
It reduces sensitivity of outlier weights in LLMs through random perturbations during fine-tuning, enabling uniform quantization without performance loss.
-----
https://arxiv.org/abs/2412.06858
🔍 Original Problem:
Existing LLM quantization methods preserve outlier weights in higher precision to maintain performance, but this mixed-precision approach reduces hardware efficiency and GPU utilization.
-----
🛠️ Solution in this Paper:
→ NPFT (Noise Perturbation Fine-tuning ) identifies outlier weights using Fisher Information Matrix and applies random perturbations during training.
→ The method reduces Hessian trace of outliers through parameter-efficient fine-tuning with LoRA adapters.
→ Noise is sampled per-channel and added to outlier locations, mimicking quantization effects.
→ A balanced loss function maintains base model performance while optimizing for perturbation robustness.
-----
💡 Key Insights:
→ Outlier sensitivity can be reduced without preserving higher precision
→ Per-channel noise sampling is more effective than per-weight sampling
→ Random perturbations efficiently estimate Hessian trace without expensive computations
→ Single fine-tuning pass works for multiple bit-width quantization targets
-----
📊 Results:
→ 3.69 PPL improvement on OPT-1.3B-4bits with RTN quantizer
→ 10% latency reduction on RTX 4090 GPU compared to mixed-precision
→ Matches GPTQ performance on LLaMA2-7B-4bits using simple RTN
→ 4x faster training than EfficientQAT with lower resource requirements
Share this post