Runtime feedback helps LLMs write code that's not just correct, but blazingly fast
PerfCodeGen is a framework that enhances LLM-generated code by incorporating runtime feedback during execution. It achieves state-of-the-art efficiency on benchmarks like HumanEval and MBPP, making smaller open models perform comparable to larger closed models like GPT-4.
-----
https://arxiv.org/abs/2412.03578
🤔 Original Problem:
LLMs excel at generating functionally correct code but often overlook runtime efficiency, which impacts user experience, serving costs, and carbon footprint. Current solutions require expensive parallel training data or don't leverage execution feedback.
-----
🛠️ Solution in this Paper:
→ PerfCodeGen first uses unit test execution feedback to refine code for correctness, increasing the pool of correct solutions
→ For performance optimization, it identifies the most time-consuming unit test through multiple executions
→ The framework then prompts the LLM with this performance feedback to generate more efficient code while maintaining functionality
→ If the optimized version breaks correctness, it falls back to the fastest correct version from earlier iterations
-----
💡 Key Insights:
→ Execution feedback during self-refinement significantly improves code efficiency
→ Open models like Phi-3-mini can match GPT-4's performance using PerfCodeGen
→ Planning phase in correctness refinement improves success rates
-----
📊 Results:
→ Achieved 40.85% optimization rate with Phi-3-mini on HumanEval (comparable to GPT-4)
→ Improved correctness rates by 5-14% across all models
→ Significant speedups on MBPP benchmark, often surpassing ground truth solutions
Share this post