Small LLMs now match GPT-4's coding skills by studying their failed attempts.
Uses an iterative preference-guided refinement mechanism that compares correct and incorrect solutions while maximizing the likelihood of correct codes.
https://arxiv.org/abs/2411.05199
🤖 Original Problem:
Code generation with LLMs often requires substantial resources and tends to over-generalize. Fine-tuning smaller open-source LLMs is a viable alternative but typically underperforms due to supervised fine-tuning's reliance only on correct code examples, limiting the model's ability to learn from mistakes.
-----
🛠️ Solution in this Paper:
→ CodeLutra enhances low-performing LLMs by learning from both successful and failed code generation attempts
→ It employs an iterative preference-guided refinement mechanism that compares correct and incorrect solutions
→ The framework uses a dual-loss function combining Direct Preference Optimization with Supervised Fine-Tuning
→ It constructs self-generated comparative data from both successful and failed attempts for continuous model improvement
→ The training process bypasses standalone SFT stage typically required before preference optimization
-----
💡 Key Insights:
→ Learning from failed attempts significantly improves model performance
→ Dual-loss mechanism prevents decreasing likelihood of correct solutions
→ Framework works effectively with limited initial data (few hundred samples)
→ Direct execution-based verification provides clear preference labels
-----
📊 Results:
→ On data query tasks, improved Llama-3-8B's execution accuracy from 59.3% to 76.6%, exceeding GPT-4's 74.4%
→ For data science tasks, using just 500 samples improved Llama-3-8B's accuracy from 28.2% to 48.6%
→ Consistent performance gains observed across different base models including Gemma-7B and StarCoder-7B
Share this post