GeoCoT: A two-stage pipeline to improve geometric reasoning in LLMs.
This paper introduces GeomRel, a dataset for evaluating LLMs' understanding of geometric structures, not just final answers, and proposes GeoCoT to improve their performance.
-----
Paper - https://arxiv.org/abs/2501.13773
Original Problem 😞:
→ Existing geometric reasoning datasets primarily focus on the accuracy of final answers.
→ This fails to capture the model's true understanding of geometric structures, as correct answers can be obtained coincidentally.
→ LLMs lack explicit decoupling of geometric relationship identification, reasoning, and calculation steps, making assessment of individual reasoning steps difficult.
-----
Solution in this Paper 💡:
→ This paper introduces the GeomRel dataset to evaluate an LLM's understanding of geometric structures by isolating the core step of Geometric Relationship Identification (GRI).
→ GeomRel extracts the fundamental step of GRI from mainstream geometric problems to serve as a minimal module for evaluating an LLM's ability to understand geometric structures.
→ To improve LLMs' GRI abilities, the paper proposes Geometry Chain-of-Thought (GeoCoT), a two-stage pipeline that breaks down geometric structures into points and lines and then extracts relevant information for reverse reasoning.
-----
Key Insights from this Paper 🤔:
→ LLMs struggle with complex geometric structures, especially angle-based relations.
→ Current evaluation methods based on final answer accuracy can misrepresent LLMs' true geometric abilities.
→ Strategic enhancements in geometric description complexity and reverse reasoning improve LLMs' GRI capabilities.
-----
Results ✅:
→ GPT-40 achieves 77.86% accuracy on basic GeomRel but only 47.93% on the advanced version.
→ GeoCoT improves GRI accuracy, with an average increase of 9.15% on basic GeomRel and 14.79% on advanced GeomRel in a few-shot setting using GPT-3.5-Turbo.
Share this post