"Enabling Scalable Oversight via Self-Evolving Critic"

Playback speed

Share post at current time

0:00

Transcript

"Enabling Scalable Oversight via Self-Evolving Critic"

Generated below podcast on this paper with Google's Illuminate.

Rohan Paul

Jan 22, 2025

LLMs can learn to critique themselves by studying correct solutions and validating their own feedback.

SCRIT (Self-evolving CRITic), enables LLMs to self-improve their critique abilities without relying on human feedback or stronger models, using mathematical reasoning as a testbed.

https://arxiv.org/abs/2501.05727

🤖 Original Problem:

→ Current LLMs need human feedback or stronger models to improve their critique abilities, creating a bottleneck in scalable oversight.

→ This becomes particularly challenging when LLMs outperform humans in complex tasks.

-----

🔍 Solution in this Paper:

→ SCRIT introduces a contrastive critique technique where models analyze reference solutions to understand core concepts before critiquing.

→ It implements a self-validation mechanism that verifies critique quality through correction outcomes.

→ The framework uses validated critique data for self-training, creating a closed feedback loop.

-----

💡 Key Insights:

→ Mathematical reasoning provides an ideal testbed due to well-defined reference solutions

→ Contrastive learning helps avoid "rubber-stamping" behavior in critique generation

→ Models perform better with simpler problems first, suggesting potential for curriculum learning

-----

📊 Results:

→ Improved critique-correction accuracy by 10.3% on benchmarks using Qwen2.5-72B-Instruct

→ Achieved 50.0% accuracy on deliberately incorrect solutions, up from 39.7%

→ Performance scales positively with data size (53.0% to 58.3%) and model size (41.7% to 58.3%)

Rohan's Bytes

"Enabling Scalable Oversight via Self-Evolving Critic"

Discussion about this video