"Taming Sensitive Weights : Noise Perturbation Fine-tuning for Robust LLM Quantization"

Playback speed

Share post at current time

0:00

Transcript

"Taming Sensitive Weights : Noise Perturbation Fine-tuning for Robust LLM Quantization"

Generated below podcast on this paper with Google's Illuminate.

Rohan Paul

Jan 07, 2025

NPFT (Noise Perturbation Fine-tuning ) makes LLM weights less sensitive to quantization by teaching them to handle noise.

It reduces sensitivity of outlier weights in LLMs through random perturbations during fine-tuning, enabling uniform quantization without performance loss.

-----

https://arxiv.org/abs/2412.06858

🔍 Original Problem:

Existing LLM quantization methods preserve outlier weights in higher precision to maintain performance, but this mixed-precision approach reduces hardware efficiency and GPU utilization.

-----

🛠️ Solution in this Paper:

→ NPFT (Noise Perturbation Fine-tuning ) identifies outlier weights using Fisher Information Matrix and applies random perturbations during training.

→ The method reduces Hessian trace of outliers through parameter-efficient fine-tuning with LoRA adapters.

→ Noise is sampled per-channel and added to outlier locations, mimicking quantization effects.

→ A balanced loss function maintains base model performance while optimizing for perturbation robustness.

-----

💡 Key Insights:

→ Outlier sensitivity can be reduced without preserving higher precision

→ Per-channel noise sampling is more effective than per-weight sampling

→ Random perturbations efficiently estimate Hessian trace without expensive computations

→ Single fine-tuning pass works for multiple bit-width quantization targets

-----

📊 Results:

→ 3.69 PPL improvement on OPT-1.3B-4bits with RTN quantizer

→ 10% latency reduction on RTX 4090 GPU compared to mixed-precision

→ Matches GPTQ performance on LLaMA2-7B-4bits using simple RTN

→ 4x faster training than EfficientQAT with lower resource requirements

Rohan's Bytes

"Taming Sensitive Weights : Noise Perturbation Fine-tuning for Robust LLM Quantization"

Discussion about this video