Weighted Preference Optimization (WPO), proposed in this paper, fixes LLM training by making old data work like fresh data through smart reweighting
Share this post
WPO: Enhancing RLHF with Weighted Preference…
Share this post
Weighted Preference Optimization (WPO), proposed in this paper, fixes LLM training by making old data work like fresh data through smart reweighting