0:00
/
0:00
Transcript

"General Preference Modeling with Preference Representations for Aligning Language Models"

The podcast on this paper is generated with Google's Illuminate.

Vector embeddings capture complex preferences better than simple reward scores.

Preference representation learning maps LLM responses to vectors for better alignment with human values

📚 https://arxiv.org/abs/2410.02197

🤖 Original Problem:

Traditional preference modeling methods for LLM alignment either lack expressiveness (Bradley-Terry models) or have high computational costs (supervised pair preference models). They struggle with intransitive preferences and require quadratic complexity for comparing multiple responses.

-----

🔧 Solution in this Paper:

→ Introduces preference representation learning - embeds responses into latent space with linear query complexity

→ Uses eigenvalue scale gate to compute context-dependent scaling factors from prompts

→ Implements eigenvector embedding head to generate response embeddings

→ Applies skew-symmetric preference operator for computing preference scores between embeddings

→ Proposes General Preference Optimization (GPO) method that generalizes reward-based RLHF

-----

💡 Key Insights:

→ Complex preferences can be modeled through vector representations instead of scalar rewards

→ Linear complexity (O(K)) achieved while maintaining expressiveness

→ Can effectively capture intransitive/cyclic preferences where traditional models fail

→ Architecture combines efficient computation with rich preference modeling

-----

📊 Results:

→ 100% accuracy on cyclic preference datasets vs random-guess performance of BT models

→ Up to 5.6% improvement over BT models on RewardBench benchmark

→ Up to 9.3% gains on AlpacaEval2.0 and MT-Bench for LLM alignment

→ Maintains linear O(K) complexity vs O(K²) for traditional methods

Discussion about this video