DPO-trained Generative judge (DJPO) achieves the best performance on 10 out of 13 benchmarks, outperforming strong baselines like GPT-4o and specialized judge models.
Share this post
DIRECT JUDGEMENT PREFERENCE OPTIMIZATION
Share this post
DPO-trained Generative judge (DJPO) achieves the best performance on 10 out of 13 benchmarks, outperforming strong baselines like GPT-4o and specialized judge models.