LLMs can learn to critique themselves by studying correct solutions and validating their own feedback.
SCRIT (Self-evolving CRITic), enables LLMs to self-improve their critique abilities without relying on human feedback or stronger models, using mathematical reasoning as a testbed.
https://arxiv.org/abs/2501.05727
🤖 Original Problem:
→ Current LLMs need human feedback or stronger models to improve their critique abilities, creating a bottleneck in scalable oversight.
→ This becomes particularly challenging when LLMs outperform humans in complex tasks.
-----
🔍 Solution in this Paper:
→ SCRIT introduces a contrastive critique technique where models analyze reference solutions to understand core concepts before critiquing.
→ It implements a self-validation mechanism that verifies critique quality through correction outcomes.
→ The framework uses validated critique data for self-training, creating a closed feedback loop.
-----
💡 Key Insights:
→ Mathematical reasoning provides an ideal testbed due to well-defined reference solutions
→ Contrastive learning helps avoid "rubber-stamping" behavior in critique generation
→ Models perform better with simpler problems first, suggesting potential for curriculum learning
-----
📊 Results:
→ Improved critique-correction accuracy by 10.3% on benchmarks using Qwen2.5-72B-Instruct
→ Achieved 50.0% accuracy on deliberately incorrect solutions, up from 39.7%
→ Performance scales positively with data size (53.0% to 58.3%) and model size (41.7% to 58.3%)
Share this post