DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents
DiscoveryWorld provides a virtual environment for developing and evaluating AI agents' ability to perform end-to-end scientific discovery across diverse domains.
DiscoveryWorld provides a virtual environment for developing and evaluating AI agents' ability to perform end-to-end scientific discovery across diverse domains.
Original Problem 🔍:
Developing and evaluating AI agents for end-to-end scientific discovery is challenging due to the prohibitive cost and complexity of real-world experiments.
Solution in this Paper 💡:
• Creates DiscoveryWorld: virtual environment for scientific discovery tasks
• Text-based simulation with optional 2D visual overlay
• 120 tasks across 8 diverse topics (e.g. proteomics, rocket science)
• 3 difficulty levels per topic
• Tasks require hypothesis formation, experimentation, analysis, and action
• Automatic evaluation metrics: task completion, relevant actions, explanatory knowledge
Key Insights from this Paper 💡:
• Provides first benchmark for general AI discovery competency
• Encourages development of broad scientific reasoning skills
• Allows low-cost, rapid iteration on discovery agent development
• Enables systematic evaluation of discovery capabilities
Results 📊:
• Baseline agents struggle on most tasks
• Human scientists find tasks challenging but solvable
• Significant performance gap between current AI models and human scientists
• Demonstrates DiscoveryWorld captures novel challenges in scientific discovery