This paper investigates whether transformers can learn robust search capabilities by training them on graph connectivity problems.
It reveals that transformers can learn search algorithms when given the right training distribution, but struggle as input size increases.
-----
https://arxiv.org/abs/2412.04703
🔍 Original Problem:
→ LLMs struggle with search tasks, but it's unclear if this is due to lack of data, insufficient parameters, or architectural limitations.
→ Previous work shows LLMs have difficulty with proof search and planning, even with chain-of-thought prompting.
-----
🛠️ Solution in this Paper:
→ The researchers used directed acyclic graphs (DAGs) as a testbed to generate unlimited training data.
→ They developed a novel mechanistic interpretability technique to extract computation graphs from trained models.
→ The model learns to compute reachable vertex sets for each vertex in the input graph.
→ Each transformer layer progressively expands these sets, enabling search over vertices exponential to layer count.
-----
💡 Key Insights:
→ Transformers can learn search when given carefully constructed training distributions
→ The model struggles more as input graph size increases
→ Increasing model parameters does not resolve the difficulty
→ In-context exploration (chain-of-thought) doesn't help with larger graphs
-----
📊 Results:
→ Model achieves 100% accuracy on training distribution with right data distribution
→ Performance degrades exponentially with increasing input graph size
→ Adding model parameters doesn't improve scaling behavior
→ Search capability doesn't emerge naturally with increased scale
Share this post