"Reward-Guided Speculative Decoding for Efficient LLM Reasoning"
Below podcast on this paper is generated with Google's Illuminate.
https://arxiv.org/abs/2501.19324
The increasing size of LLMs improves performance but dramatically raises inference costs. This paper addresses the challenge of efficient inference for complex reasoning tasks in LLMs.
This paper proposes Reward-Guided Speculative Decoding (RSD). RSD uses a smaller draft model and a larger target model. A reward model guides the selection of outputs from these models to optimize efficiency and accuracy.
-----
📌 Reward-Guided Speculative Decoding smartly uses a reward model to guide decoding. This allows for efficient step-wise refinement. It moves beyond strict token matching in standard speculative decoding.
📌 RSD's adaptive weighting function is key. It dynamically mixes draft and target models based on reward signals. This mixture optimizes compute use by favoring the draft model for high-quality steps.
📌 The binary threshold weighting strategy in RSD offers a practical advantage. It simplifies the decision process using a threshold parameter. This allows for easy tuning between efficiency and accuracy.
----------
Methods Explored in this Paper 🔧:
→ Reward-Guided Speculative Decoding (RSD) is introduced. RSD uses a lightweight draft model and a powerful target model.
→ A process reward model evaluates each decoding step. This determines whether to use the target model or accept the draft model's output.
→ RSD employs a dynamic mixture of draft and target model distributions. The mixture weight is based on a reward function.
→ A binary step function is proposed as an effective weighting function. This function uses a threshold to decide between draft and target models.
→ Rejection sampling is used to refine draft outputs with the target model when the reward is below the threshold.
-----
Key Insights 💡:
→ RSD adaptively balances computational cost and output quality. It prioritizes high-reward outputs.
→ RSD addresses the limitations of standard speculative decoding. Standard speculative decoding strictly enforces unbiasedness which can reduce efficiency.
→ Using a reward function allows RSD to accept high-quality draft tokens even if they don't perfectly match the target model's distribution. This improves efficiency.
→ Theoretically, a binary threshold weighting strategy is optimal for maximizing reward under computational constraints.
-----
Results 📊:
→ RSD achieves up to 4.4× fewer FLOPs compared to using the target model alone on MATH500 benchmark.
→ RSD improves reasoning accuracy by up to 3.5 on average compared to standard Speculative Decoding across multiple benchmarks.
→ RSD outperforms search-based methods like Beam Search and Best-of-N in accuracy on MATH500, GSM8K and Minerva Math datasets.