"Transformers Can Navigate Mazes With Multi-Step Prediction"

Playback speed

Share post at current time

0:00

Transcript

"Transformers Can Navigate Mazes With Multi-Step Prediction"

The podcast on this paper is generated with Google's Illuminate.

Rohan Paul

Jan 04, 2025

This paper shows how transformers can better navigate mazes by predicting multiple steps ahead using MLM-U objective instead of standard next-token prediction.

MLM-U achieves perfect navigation on complex mazes while being 4x more data efficient and 2x faster to train.

-----

https://arxiv.org/abs/2412.05117

🤖 Original Problem:

Transformers trained with next-token prediction struggle with long-term planning tasks like maze navigation because they can only predict one step at a time, leading to poor performance on complex paths.

-----

🔧 Solution in this Paper:

→ The paper introduces MLM-U training objective that masks random portions of the navigation path, forcing the model to predict multiple steps simultaneously both forward and backward.

→ MLM-U uses uniform masking rates drawn from [0,1] to expose the model to different sequence lengths during training.

→ The solution employs an encoder-decoder architecture with RoPE positional embeddings and 32-bit precision for accurate position tracking.

-----

🎯 Key Insights:

→ Multi-step prediction is crucial for complex navigation tasks

→ Higher precision positional encodings significantly improve performance on larger mazes

→ Model scaling benefits MLM-U more than next-token prediction

-----

📊 Results:

→ 8M parameter MLM-U model achieves 100% accuracy on mazes up to 20x20 grid size

→ 4x more data efficient than next-token prediction

→ Outperforms 175M parameter models using A* supervision on 30x30 mazes (85.5% vs 70.2%)

Rohan's Bytes

"Transformers Can Navigate Mazes With Multi-Step Prediction"

Discussion about this video