"InfAlign: Inference-aware language model alignment"

Playback speed

Share post at current time

0:00

Transcript

"InfAlign: Inference-aware language model alignment"

Generated below podcast on this paper with Google's Illuminate.

Rohan Paul

Jan 23, 2025

LLMs are often used with inference-time procedures like Best-of-N, but standard alignment doesn't account for this. This paper introduces Inference-Aware Alignment (InfAlign) to address this gap.

https://arxiv.org/abs/2412.19792

Original Problem 🤔:

→ Standard LLM alignment maximizes reward for single samples, ignoring inference-time procedures like Best-of-N.

→ This mismatch leads to suboptimal performance when inference procedures are used.

Solution in this Paper 💡:

→ InfAlign framework optimizes for inference-time win rate, considering the chosen procedure.

→ InfAlign uses a transformed reward in KL-regularized RL, capturing the inference process.

→ For Best-of-N and Worst-of-N, InfAlign provides near-optimal reward transformations, including exponential tilting.

→ A practical solver, Calibrate-and-Transform RL (CTRL), calibrates the reward model and applies the transformation before KL-RL.

Key Insights from this Paper 🔑:

→ Alignment should consider the full inference pipeline.

→ Reward transformations can effectively capture inference procedures.

→ Calibrating rewards improves robustness and performance.

Results 💯:

→ CTRL improves inference-time win rates by 8-12% for Best-of-N helpfulness and 4-9% for Worst-of-N harmlessness on Anthropic datasets.

→ Calibration alone improves standard win rates compared to baselines.

→ Higher N in Best-of-N and Worst-of-N leads to further gains.

Rohan's Bytes

"InfAlign: Inference-aware language model alignment"

Discussion about this video