Smart noise filtering helps LLMs think better at any temperature
Statistical filtering of logits beats probability-based sampling in LLMs
This paper introduces top-n-sigma, a novel token sampling method that filters pre-softmax logits using statistical thresholds. It separates logits into noisy and informative regions, maintaining stable performance across different temperature settings, unlike traditional probability-based methods.
-----
https://arxiv.org/abs/2411.07641
🤔 Original Problem:
Traditional sampling methods like top-k and nucleus sampling struggle with reasoning tasks at higher temperatures, forcing the use of greedy decoding or low temperatures. This limits the model's ability to generate diverse yet accurate responses.
-----
🔧 Solution in this Paper:
→ The method analyzes pre-softmax logit distributions, which naturally separate into a Gaussian-distributed noisy region and an informative region.
→ It uses a statistical threshold (n-sigma) to filter tokens directly from logits, without complex probability manipulations.
→ The algorithm maintains a stable sampling space regardless of temperature scaling, unlike existing methods that include more noise at higher temperatures.
→ Implementation is computationally efficient as it operates directly on logits without requiring sorting or additional softmax transformations.
-----
💡 Key Insights:
→ Logits naturally form two distinct regions: a Gaussian noise distribution and informative outliers
→ Higher sigma-distances correlate with smaller nucleus sizes, indicating stronger model confidence
→ Temperature-invariant sampling is possible by operating directly on logits
-----
📊 Results:
→ Outperforms existing sampling approaches across four reasoning-focused datasets
→ Maintains consistent performance even at high temperatures (T=1.5)
→ Achieves better results than greedy decoding while preserving sampling diversity
Share this post