0:00
/
0:00
Transcript

"Exploring the Compositional Deficiency of LLMs in Mathematical Reasoning Through Trap Problems"

Generated this podcast with Google's Illuminate.

LLMs struggle to combine learned knowledge for novel math problems, unlike humans.

📚 https://arxiv.org/abs/2405.06680

Original Problem 🔍:

LLMs struggle with systematic compositionality in mathematical reasoning, despite impressive performance on complex tasks. This paper investigates their ability to combine learned knowledge components to solve novel problems.

-----

Solution in this Paper 💡:

• Constructs MATHTRAP dataset by adding logical traps to MATH/GSM8K problems

• Traps require combining math knowledge with trap-related knowledge

• Evaluates LLMs on original, trap, and conceptual problems

• Explores interventions: prompts, few-shot demos, fine-tuning

-----

Key Insights from this Paper 💡:

• LLMs fail to spontaneously combine knowledge to solve trap problems

• Stark performance gap between humans and LLMs on compositional tasks

• External interventions can improve LLM performance on trap problems

• Compositional generalization remains a key challenge for LLMs

-----

Results 📊:

• Closed-source LLMs: >70% on conceptual problems, <50% accuracy ratio on traps

• Open-source: ~40% on conceptual/original, <20% accuracy ratio on traps

• Humans: 83.8% on traps without notice, 95.1% with notice

• Interventions improved performance, e.g. 5-shot demos boosted GPT-3.5 from 7.6% to 23.9%

Discussion about this video