Are they really reasoning or just good guessers? This paper reveals the truth.
This paper proposes a new way to measure how well LLMs reason, going beyond simple accuracy. It uses positional bias in multiple-choice questions to see if LLMs actually understand the logic or just rely on memorization and guessing.
-----
Paper - https://arxiv.org/abs/2501.13833
Original Problem 🤔:
→ LLMs perform well on reasoning benchmarks, but their actual reasoning abilities are unclear.
→ Standard accuracy metrics don't reveal how LLMs make decisions.
→ It is important to distinguish between true logical deduction and other cognitive processes like memorization.
-----
Solution in this Paper 💡:
→ This paper introduces two models to analyze LLM behavior: a Probabilistic Mixture Model (PMM) and Information-Theoretic Consistency (ITC) analysis.
→ The PMM separates LLM responses into reasoning, memorization, and guessing components.
→ The ITC analysis looks at how confident the LLM is in its answers and how it chooses its strategies.
→ Positional bias, where the position of the answer in a multiple-choice question affects the LLM's answer, is used as a test case.
-----
Key Insights from this Paper 🧐:
→ True reasoning is still difficult for current LLMs.
→ LLMs often rely on memorization and pattern matching instead of logic.
→ Accuracy alone doesn't fully show an LLM's reasoning abilities.
→ LLMs balance different approaches depending on the question.
-----
Results 💪:
→ Position D in multiple choice questions is treated differently by the model, showing systematic positional bias.
→ Model accuracy is near perfect for questions where the observed accuracy is greater than 0.4.
→ The model predominately relies on memorization (47%), with reasoning and guessing contributing 26% and 27%, respectively.
Share this post