0:00
/
0:00
Transcript

"CodeLutra: Boosting LLM Code Generation via Preference-Guided Refinement"

The podcast on this paper is generated with Google's Illuminate.

Small LLMs now match GPT-4's coding skills by studying their failed attempts.

Uses an iterative preference-guided refinement mechanism that compares correct and incorrect solutions while maximizing the likelihood of correct codes.

https://arxiv.org/abs/2411.05199

🤖 Original Problem:

Code generation with LLMs often requires substantial resources and tends to over-generalize. Fine-tuning smaller open-source LLMs is a viable alternative but typically underperforms due to supervised fine-tuning's reliance only on correct code examples, limiting the model's ability to learn from mistakes.

-----

🛠️ Solution in this Paper:

→ CodeLutra enhances low-performing LLMs by learning from both successful and failed code generation attempts

→ It employs an iterative preference-guided refinement mechanism that compares correct and incorrect solutions

→ The framework uses a dual-loss function combining Direct Preference Optimization with Supervised Fine-Tuning

→ It constructs self-generated comparative data from both successful and failed attempts for continuous model improvement

→ The training process bypasses standalone SFT stage typically required before preference optimization

-----

💡 Key Insights:

→ Learning from failed attempts significantly improves model performance

→ Dual-loss mechanism prevents decreasing likelihood of correct solutions

→ Framework works effectively with limited initial data (few hundred samples)

→ Direct execution-based verification provides clear preference labels

-----

📊 Results:

→ On data query tasks, improved Llama-3-8B's execution accuracy from 59.3% to 76.6%, exceeding GPT-4's 74.4%

→ For data science tasks, using just 500 samples improved Llama-3-8B's accuracy from 28.2% to 48.6%

→ Consistent performance gains observed across different base models including Gemma-7B and StarCoder-7B

Discussion about this video