Vector embeddings capture complex preferences better than simple reward scores.
Preference representation learning maps LLM responses to vectors for better alignment with human values
📚 https://arxiv.org/abs/2410.02197
🤖 Original Problem:
Traditional preference modeling methods for LLM alignment either lack expressiveness (Bradley-Terry models) or have high computational costs (supervised pair preference models). They struggle with intransitive preferences and require quadratic complexity for comparing multiple responses.
-----
🔧 Solution in this Paper:
→ Introduces preference representation learning - embeds responses into latent space with linear query complexity
→ Uses eigenvalue scale gate to compute context-dependent scaling factors from prompts
→ Implements eigenvector embedding head to generate response embeddings
→ Applies skew-symmetric preference operator for computing preference scores between embeddings
→ Proposes General Preference Optimization (GPO) method that generalizes reward-based RLHF
-----
💡 Key Insights:
→ Complex preferences can be modeled through vector representations instead of scalar rewards
→ Linear complexity (O(K)) achieved while maintaining expressiveness
→ Can effectively capture intransitive/cyclic preferences where traditional models fail
→ Architecture combines efficient computation with rich preference modeling
-----
📊 Results:
→ 100% accuracy on cyclic preference datasets vs random-guess performance of BT models
→ Up to 5.6% improvement over BT models on RewardBench benchmark
→ Up to 9.3% gains on AlpacaEval2.0 and MT-Bench for LLM alignment
→ Maintains linear O(K) complexity vs O(K²) for traditional methods
Share this post