0:00
/
0:00
Transcript

"Multi-Programming Language Sandbox for LLMs"

The podcast on this paper is generated with Google's Illuminate.

A unified sandbox that tests LLM-generated code in 8 programming languages with comprehensive analysis

📚 https://arxiv.org/abs/2410.23074

🎯 Original Problem:

Current sandbox tools for code generation only support single programming languages and lack comprehensive code analysis capabilities, making it hard to improve LLM performance on multi-language code tasks.

-----

🛠️ Solution in this Paper:

→ MPLSandbox: A unified multi-language sandbox with three core modules:

→ Multi-Programming Language Sandbox Environment: Isolated sub-sandboxes for 8 programming languages (Python, Java, C++, C#, Go, JavaScript, TypeScript, Bash)

→ Code Analysis Module: Integrates tools for code smell, bugs, efficiency, unit tests, and basic code structure analysis

→ Information Integration Module: Combines compiler feedback and analysis results to enhance LLM performance

-----

💡 Key Insights:

→ First sandbox tool to support multiple programming languages simultaneously

→ Provides comprehensive code analysis beyond just compilation

→ Can be deployed as standalone or distributed system

→ Easy integration with LLM training pipelines

→ Extensible architecture for adding new languages and tools

-----

📊 Results:

→ Tested with multiple LLMs including DeepSeek-Coder (6.7B) and Qwen2.5-Coder (7B)

→ Best performance with DeepSeek-Coder-V2-Lite (16B):

- Python: Pass@1: 29.6%, Pass@10: 50.2%

- Java: Pass@1: 26.8%, Pass@10: 47.7%

- C++: Pass@1: 25.1%, Pass@10: 44.6%

Discussion about this video

User's avatar