Instead of word-by-word or entire essays, this paper teaches RLHF using natural text segments.
A novel approach to improve RLHF by assigning rewards to meaningful text segments instead of individual tokens or entire sequences, making reward learning more effective and semantically coherent.
-----
https://arxiv.org/abs/2501.02790
🤔 Original Problem:
→ Current RLHF methods either use coarse rewards for entire sequences (sparse feedback) or very fine-grained token-level rewards (too granular for meaningful evaluation)
→ Neither approach captures semantic completeness effectively
-----
💡 Solution in this Paper:
→ The paper introduces a segment-level reward model that dynamically breaks text into meaningful semantic units using entropy-based thresholding
→ It identifies segment boundaries by analyzing the entropy of language model's predictive distribution
→ Higher entropy indicates start of new semantic segment, typically spanning 3-7 words
→ The model assigns rewards to these segments rather than individual tokens
→ Uses location-aware reward normalization during policy optimization
→ Implements segment reward interpolation for denser training signals
-----
🔑 Key Insights:
→ Semantic completeness matters for reward assignment
→ Dynamic segmentation better captures meaningful text units
→ Balance needed between sparse and overly dense rewards
→ Location-aware normalization helps handle varying segment positions
-----
📊 Results:
→ Outperforms both bandit and token-level approaches on AlpacaEval 2.0, Arena-Hard, and MT-Bench
→ Entropy threshold of 1.75-2.0 works best for segmentation
→ Shows consistent improvements across different model sizes
Share this post