Prior knowledge beats in-context examples: LLMs know more than we thought about generating hypotheses
This study reveals that LLMs primarily rely on their pre-trained knowledge rather than in-context examples when generating hypotheses for real-world tasks.
-----
https://arxiv.org/abs/2412.13645
🤔 Original Problem:
→ Current research assumes LLMs need in-context demonstrations to generate quality hypotheses, but the distinct roles of model prior knowledge versus demonstrations remain unclear
-----
🔍 Solution in this Paper:
→ The researchers evaluated three hypothesis generation strategies across five real-world tasks using three LLMs
→ They tested direct input-output prompting, iterative refinement with ranking, and HypoGeniC
→ They compared hypothesis quality with and without demonstrations using classification performance, LLM assessments, and human evaluation
→ They analyzed model behavior across different label formats and configurations
-----
💡 Key Insights:
→ Model prior knowledge dominates hypothesis generation in real-world tasks
→ Removing demonstrations has minimal impact on hypothesis quality
→ Prior knowledge is extremely robust and difficult to override even with contradictory demonstrations
→ This finding held consistent across text, image, and image-text modalities
-----
📊 Results:
→ Accuracy remained within 3% difference between with/without demonstration scenarios
→ Human evaluators preferred hypotheses generated without demonstrations
→ LLM-based evaluation showed higher helpfulness scores (4.01 vs 3.95) for no-demonstration cases
→ Performance stayed consistent even with flipped or random labels
Share this post