0:00
/
0:00
Transcript

"Training Software Engineering Agents and Verifiers with SWE-Gym"

Generated below podcast on this paper with Google's Illuminate.

SWE-Gym lets AI agents practice coding on real GitHub issues.

SWE-Gym introduces a training environment for software engineering agents, combining real GitHub tasks with executable test verification to improve automated code fixes.

-----

https://arxiv.org/abs/2412.21139

Original Problem 🤔:

→ Current software engineering automation relies heavily on proprietary models and lacks proper training environments with real-world tasks and verification mechanisms

→ Existing datasets either lack executable environments or use synthetic tasks, making it difficult to train effective agents

-----

Solution in this Paper 🛠️:

→ SWE-Gym provides 2,438 Python tasks from 11 popular repositories, each with complete codebase, runtime environment, and unit tests

→ Uses rejection sampling fine-tuning to train LLM agents on successful task completions

→ Implements verifier models trained on agent trajectories to enable better solution selection

→ Combines both general-purpose prompting and specialized workflow approaches

-----

Key Insights 💡:

→ Training environment quality matters more than quantity for real-world tasks

→ Performance scales consistently with more compute in both training and inference

→ Verifier models enable effective trajectory selection for better results

→ Combined approach of fine-tuned agents and verifiers achieves state-of-the-art performance

-----

Results 📊:

→ Achieved 19% absolute gains in resolve rate on SWE-Bench test sets

→ Reached 32% success rate on SWE-Bench Verified

→ Improved to 26% on SWE-Bench Lite

→ Demonstrated continuous scaling benefits with increased compute

------

Are you into AI and LLMs❓ Join my daily AI newsletter. I will send you 7 emails a week analyzing the highest signal AI developments. ↓↓

🎉 https://rohanpaul.substack.com/

Discussion about this video

User's avatar