"Correctness Assessment of Code Generated by Large Language Models Using Internal Representations"

Playback speed

Share post at current time

0:00

Transcript

Below podcast is generated with Google's Illuminate.

Jan 28, 2025

OPENIA, a new white-box (open-box) framework, checks LLM code for correctness as it's written, using the model's own thinking.

OPENIA assesses code correctness during generation by analyzing LLMs' internal representations, improving efficiency over post-hoc methods.

Original Problem 🤔:

→ Ensuring correctness of LLM-generated code is crucial but challenging.

→ Existing methods mainly rely on resource-intensive post-hoc analyses.

Solution in this Paper 💡:

→ OPENIA leverages LLMs' internal representations during code generation to assess correctness.

→ It analyzes intermediate states of open-source code LLMs (DeepSeek-Coder, CodeLlama, MagicCoder).

→ OPENIA extracts internal states and feeds them to a probing classifier predicting code correctness.

Key Insights from this Paper 🤯:

→ Internal LLM representations encode latent information correlating with code correctness.

→ Middle layers of LLMs offer the most informative representations for correctness assessment.

→ OPENIA's performance varies with code length and task difficulty.

Results 📊:

→ OPENIA outperforms baselines by up to 2x in accuracy for standalone code generation.

→ Achieves up to 46% improvement in F1-score for repository-level code generation.

→ OPENIA's inference takes only 0.6ms per code unit.

Rohan's Bytes