"Self-Taught Evaluators"

Playback speed

Share post at current time

0:00

Transcript

"Self-Taught Evaluators"

The podcast on this paper is generated with Google's Illuminate.

Rohan Paul

Jan 02, 2025

Great paper from @Meta

Synthetic data and iterative self-improvement is all you need.

No humans needed in the evaluation loop.

This paper introduces a self-improving evaluator that learns to assess LLM outputs without human feedback, using synthetic data and iterative self-training to match top human-supervised models.

-----

📚 https://arxiv.org/pdf/2408.02666

Original Problem 🤔:

Building strong LLM evaluators typically requires extensive human preference data, which is costly and becomes outdated as models improve. Current approaches rely heavily on human annotations, limiting scalability and adaptability.

-----

Solution in this Paper 🔧:

→ The method starts with unlabeled instructions and uses a seed LLM to generate contrasting response pairs, where one is intentionally inferior.

→ It then uses an LLM-as-Judge approach to generate reasoning traces and final judgments for these synthetic pairs.

→ The system filters correct judgments and uses them to train an improved evaluator model.

→ This process repeats iteratively, with each iteration using the improved model to generate better synthetic training data.

-----

Key Insights from this Paper 💡:

→ Human preference data isn't necessary for training strong LLM evaluators

→ Synthetic data generation with iterative self-improvement can match human-supervised approaches

→ Different data sources (safety, math, coding) improve performance in their respective domains

-----

Results 📊:

→ Improved RewardBench accuracy from 75.4 to 88.3 (88.7 with majority voting)

→ Outperformed GPT-4 (84.3) and matched top reward models trained with human data

→ Achieved 79.5% agreement with human judgments on MT-Bench using majority voting

Rohan's Bytes

"Self-Taught Evaluators"

Discussion about this video