"Constraint Back-translation Improves Complex Instruction Following of Large Language Models"

Playback speed

Share post at current time

0:00

Transcript

"Constraint Back-translation Improves Complex Instruction Following of Large Language Models"

The podcast on this paper is generated with Google's Illuminate.

Rohan Paul

Dec 23, 2024

Back-translating constraints from responses: A smarter way to train instruction-following models

Instead of generating new responses, this paper discovers constraints in existing high-quality responses.

📚 https://arxiv.org/abs/2410.24175

Original Problem 🤔:

LLMs struggle to follow complex instructions with multiple constraints like format, length, etc. Even advanced LLMs generate noisy data when following complex constraints, limiting the quality of training data generated using conventional methods.

-----

Solution in this Paper 🛠️:

→ Introduces "constraint back-translation" - identifies constraints already satisfied by existing high-quality responses instead of generating new ones

→ Created CRAB dataset with 13,500 instruction-response pairs from existing datasets

→ Used Llama3-70B-Instruct to back-translate constraints from responses

→ Applied both forward training (standard supervised fine-tuning) and reverse training (model learns to identify constraints)

→ Employed Direct Preference Optimization (DPO) for further improvements

-----

Key Insights from this Paper 💡:

→ Existing datasets inherently contain implicit complex constraints that can be leveraged

→ Back-translating constraints from existing responses is more efficient than generating new responses

→ Reverse training helps models better understand constraints

→ The approach reduces costs and data noise significantly

-----

Results 📊:

→ Outperformed base models and other open-source models on complex instruction following benchmarks

→ Achieved better performance than previous state-of-the-art models like Conifer on IFEval

→ Showed superior general instruction-following capabilities on AlpacaEval

→ Mistral_Crab+DPO achieved 59.3% on IFEval and 49.4% on FollowBench

Rohan's Bytes

"Constraint Back-translation Improves Complex Instruction Following of Large Language Models"

Discussion about this video