This is so incredible to see - AI outperforms humans in research ideation novelty🤯
📚 https://arxiv.org/abs/2409.04109
Key Insights from this Paper 💡:
• LLM-generated ideas are judged as more novel than human expert ideas
• AI ideas may be slightly less feasible than human ideas
• LLMs cannot reliably evaluate research ideas yet
-----
Solution in this Paper 🛠️:
• Recruited 100+ NLP researchers for idea generation and blind review
• Developed an LLM ideation agent with:
- RAG-based paper retrieval
- Overgeneration of ideas (4000 per topic)
- LLM-based idea ranking
• Implemented strict controls:
- Standardized idea format and writing style
- Matched topic distribution between human and AI ideas
• Evaluated ideas on novelty, excitement, feasibility, effectiveness, and overall quality
-----
Results 📊:
• AI ideas rated significantly more novel than human ideas (p < 0.05)
- Novelty score: 5.64 (AI) vs 4.84 (Human)
• AI ideas slightly less feasible than human ideas (not statistically significant)
- Feasibility score: 6.34 (AI) vs 6.61 (Human)
• Only 5% of AI-generated ideas were non-duplicates
• LLM evaluators showed lower agreement with human reviewers than inter-human agreement
- Best LLM evaluator accuracy: 53.3% vs Human inter-reviewer consistency: 56.1%
Share this post