Weighted Preference Optimization (WPO), proposed in this paper, fixes LLM training by making old data work like fresh data through smart reweighting
WPO: Enhancing RLHF with Weighted Preference…
Weighted Preference Optimization (WPO), proposed in this paper, fixes LLM training by making old data work like fresh data through smart reweighting