0:00
/
0:00
Transcript

"Transformers Struggle to Learn to Search"

Generated below podcast on this paper with Google's Illuminate.

Transformers learn search algorithms but struggle as graph size increases, revealing fundamental limitations.

This paper investigates transformers' ability to learn search algorithms, using graph connectivity as a testbed and developing a novel mechanistic interpretability technique.

-----

https://arxiv.org/abs/2412.04703

🔍 Original Problem:

LLMs struggle with robust search capabilities, but it's unclear if this is due to lack of data, insufficient parameters, or architectural limitations.

-----

💡 Solution in this Paper:

→ The researchers use graph connectivity as a testbed to generate unlimited high-coverage data for training small transformers.

→ They develop a novel mechanistic interpretability technique to extract computation graphs from trained models.

→ The study explores three distributions of DAG search problems: naïve, star, and balanced.

→ They analyze the algorithm learned by transformers through this new interpretability method.

→ The paper also investigates whether in-context exploration (chain-of-thought) improves search capabilities.

-----

🧠 Key Insights from this Paper:

→ Transformers can learn to search effectively when given the right training distribution.

→ Models learn an exponential path-merging algorithm for search.

→ Increasing model scale does not help overcome difficulties with larger input graphs.

→ In-context exploration (chain-of-thought) does not resolve scaling limitations.

-----

📊 Results:

→ Models achieve near-perfect accuracy on test sets with balanced distribution.

→ Performance degrades as input graph size increases, even with more parameters.

→ Constant number of layers required for depth-first search, but still struggles with larger graphs.

→ Selection-inference approach does not overcome scaling limitations.

Discussion about this video

User's avatar