This study reveals when LLMs actually figure out their answers - before or during explanations.
This research investigates how LLMs internally determine their answers during Chain-of-Thought reasoning, revealing whether they follow a "think-to-talk" (predetermined conclusion) or "talk-to-think" (step-by-step reasoning) approach.
-----
https://arxiv.org/abs/2412.01113
Original Problem 🤔:
→ We don't know if Chain-of-Thought explanations from LLMs are genuine step-by-step reasoning or post-hoc justifications of predetermined answers.
-----
Solution in this Paper 🔬:
→ The researchers used causal probing to analyze model internals at each layer during each timestep, tracking when answers emerge during reasoning.
→ They created controlled arithmetic tasks with varying complexity levels to test how models handle simple vs complex calculations.
→ Linear probes were trained to predict variable values from hidden states, revealing when computations actually occur.
→ They validated findings through activation patching experiments to confirm causal relationships between intermediate calculations and final answers.
-----
Key Insights 💡:
→ Simple single-step calculations are solved before Chain-of-Thought begins
→ Complex multi-hop problems are computed during the explanation process
→ Models show systematic patterns across different sizes and architectures
→ The relationship between predetermined sub-answers and final outputs is somewhat indirect
-----
Results 📊:
→ Achieved nearly 100% task accuracy across 10 different LLM architectures
→ Larger models computed intermediate answers slightly earlier than smaller ones
→ Models consistently solved single-step problems (steps ≤ 1) before starting explanations
Share this post