A unified sandbox that tests LLM-generated code in 8 programming languages with comprehensive analysis
📚 https://arxiv.org/abs/2410.23074
🎯 Original Problem:
Current sandbox tools for code generation only support single programming languages and lack comprehensive code analysis capabilities, making it hard to improve LLM performance on multi-language code tasks.
-----
🛠️ Solution in this Paper:
→ MPLSandbox: A unified multi-language sandbox with three core modules:
→ Multi-Programming Language Sandbox Environment: Isolated sub-sandboxes for 8 programming languages (Python, Java, C++, C#, Go, JavaScript, TypeScript, Bash)
→ Code Analysis Module: Integrates tools for code smell, bugs, efficiency, unit tests, and basic code structure analysis
→ Information Integration Module: Combines compiler feedback and analysis results to enhance LLM performance
-----
💡 Key Insights:
→ First sandbox tool to support multiple programming languages simultaneously
→ Provides comprehensive code analysis beyond just compilation
→ Can be deployed as standalone or distributed system
→ Easy integration with LLM training pipelines
→ Extensible architecture for adding new languages and tools
-----
📊 Results:
→ Tested with multiple LLMs including DeepSeek-Coder (6.7B) and Qwen2.5-Coder (7B)
→ Best performance with DeepSeek-Coder-V2-Lite (16B):
- Python: Pass@1: 29.6%, Pass@10: 50.2%
- Java: Pass@1: 26.8%, Pass@10: 47.7%
- C++: Pass@1: 25.1%, Pass@10: 44.6%
Share this post