OPENIA, a new white-box (open-box) framework, checks LLM code for correctness as it's written, using the model's own thinking.
OPENIA assesses code correctness during generation by analyzing LLMs' internal representations, improving efficiency over post-hoc methods.
https://arxiv.org/abs/2501.12934
Original Problem 🤔:
→ Ensuring correctness of LLM-generated code is crucial but challenging.
→ Existing methods mainly rely on resource-intensive post-hoc analyses.
Solution in this Paper 💡:
→ OPENIA leverages LLMs' internal representations during code generation to assess correctness.
→ It analyzes intermediate states of open-source code LLMs (DeepSeek-Coder, CodeLlama, MagicCoder).
→ OPENIA extracts internal states and feeds them to a probing classifier predicting code correctness.
Key Insights from this Paper 🤯:
→ Internal LLM representations encode latent information correlating with code correctness.
→ Middle layers of LLMs offer the most informative representations for correctness assessment.
→ OPENIA's performance varies with code length and task difficulty.
Results 📊:
→ OPENIA outperforms baselines by up to 2x in accuracy for standalone code generation.
→ Achieves up to 46% improvement in F1-score for repository-level code generation.
→ OPENIA's inference takes only 0.6ms per code unit.
Share this post