GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
GenARM guides LLMs using token-level rewards without retraining the base model.
GenARM guides LLMs using token-level rewards without retraining the base model.
ARM predicts next-token rewards instantly, making LLM alignment blazingly fast.
Original Problem ๐:
Test-time alignment methods for LLMs face challenges in efficiently predicting next-token rewards for autoregressive generation, leading to inaccuracies or high inference costs.
Solution in this Paper ๐ง :
โข Introduces GenARM: A test-time alignment approach using Autoregressive Reward Model (ARM)
โข ARM parameterizes rewards as log probabilities, enabling efficient token-level factorization
โข GenARM integrates ARM's next-token rewards with frozen LLM logits for guided generation
โข Supports weak-to-strong guidance and multi-objective alignment without retraining
Key Insights from this Paper ๐ก:
โข ARM preserves full expressiveness of reward function class within KL-regularized RL framework
โข GenARM enables efficient weak-to-strong guidance, aligning larger LLMs with smaller RMs
โข Supports real-time trade-offs between preference dimensions without retraining
โข Matches performance of training-time methods like DPO without fine-tuning base LLM
Results ๐:
โข Outperforms test-time baselines ARGS and Transfer-Q in human preference alignment
โข Matches or exceeds performance of training-time method DPO
โข 7B ARM successfully guides 70B LLM, recovering >80% of performance gap vs fine-tuned 70B
โข Enables more effective multi-objective alignment compared to baselines like Rewarded Soups
โข Maintains efficiency advantages over methods like Best-of-N sampling
๐ How does GenARM enable weak-to-strong guidance and multi-objective alignment?
Weak-to-strong guidance:
Uses a smaller Autoregressive RM (e.g. 7B) to guide a larger frozen LLM (e.g. 70B)
Avoids need to train or fine-tune the larger model
Recovers significant portion of performance gap between base and fine-tuned large models



