0:00
/
0:00
Transcript

"Transformers Can Navigate Mazes With Multi-Step Prediction"

The podcast on this paper is generated with Google's Illuminate.

This paper shows how transformers can better navigate mazes by predicting multiple steps ahead using MLM-U objective instead of standard next-token prediction.

MLM-U achieves perfect navigation on complex mazes while being 4x more data efficient and 2x faster to train.

-----

https://arxiv.org/abs/2412.05117

🤖 Original Problem:

Transformers trained with next-token prediction struggle with long-term planning tasks like maze navigation because they can only predict one step at a time, leading to poor performance on complex paths.

-----

🔧 Solution in this Paper:

→ The paper introduces MLM-U training objective that masks random portions of the navigation path, forcing the model to predict multiple steps simultaneously both forward and backward.

→ MLM-U uses uniform masking rates drawn from [0,1] to expose the model to different sequence lengths during training.

→ The solution employs an encoder-decoder architecture with RoPE positional embeddings and 32-bit precision for accurate position tracking.

-----

🎯 Key Insights:

→ Multi-step prediction is crucial for complex navigation tasks

→ Higher precision positional encodings significantly improve performance on larger mazes

→ Model scaling benefits MLM-U more than next-token prediction

-----

📊 Results:

→ 8M parameter MLM-U model achieves 100% accuracy on mazes up to 20x20 grid size

→ 4x more data efficient than next-token prediction

→ Outperforms 175M parameter models using A* supervision on 30x30 mazes (85.5% vs 70.2%)

Discussion about this video