0:00
/
0:00
Transcript

"PERC: Plan-As-Query Example Retrieval for Underrepresented Code Generation"

Generated below podcast on this paper with Google's Illuminate.

PERC uses pseudocode plans to find similar code examples across programming languages, like a universal code translator.

PERC introduces a novel approach that converts code to pseudocode plans for retrieving relevant examples, improving code generation accuracy across different programming languages.

-----

https://arxiv.org/abs/2412.12447

🤔 Original Problem:

Existing code generation systems struggle to find effective examples when the target programming language has limited data. Current retrieval methods focus on code syntax, making it hard to identify algorithmically similar examples across different languages.

-----

🔧 Solution in this Paper:

→ PERC converts source code into pseudocode plans that capture algorithmic logic while removing language-specific syntax

→ It uses these plans both for retrieving similar examples and as intermediate reasoning steps during code generation

→ The system generates pseudocode for both the query and retrieval pool, enabling matching based on algorithmic similarity

→ When source and target languages differ, PERC converts retrieved examples to the target language

-----

💡 Key Insights:

→ Algorithmic plans in pseudocode effectively bridge the gap between different programming languages

→ Converting code to pseudocode reduces noise from syntax differences

→ Using plans in the reasoning chain improves generation accuracy

-----

📊 Results:

→ Achieved 76.04% Pass@1 on HumanEval with GPT-3.5

→ Improved code generation for underrepresented languages: 69.81% for Ruby, 64.10% for Lua

→ Maintained performance when mixing multiple source languages in the retrieval pool

Discussion about this video