This paper shows how transformers can better navigate mazes by predicting multiple steps ahead using MLM-U objective instead of standard next-token prediction.
MLM-U achieves perfect navigation on complex mazes while being 4x more data efficient and 2x faster to train.
-----
https://arxiv.org/abs/2412.05117
🤖 Original Problem:
Transformers trained with next-token prediction struggle with long-term planning tasks like maze navigation because they can only predict one step at a time, leading to poor performance on complex paths.
-----
🔧 Solution in this Paper:
→ The paper introduces MLM-U training objective that masks random portions of the navigation path, forcing the model to predict multiple steps simultaneously both forward and backward.
→ MLM-U uses uniform masking rates drawn from [0,1] to expose the model to different sequence lengths during training.
→ The solution employs an encoder-decoder architecture with RoPE positional embeddings and 32-bit precision for accurate position tracking.
-----
🎯 Key Insights:
→ Multi-step prediction is crucial for complex navigation tasks
→ Higher precision positional encodings significantly improve performance on larger mazes
→ Model scaling benefits MLM-U more than next-token prediction
-----
📊 Results:
→ 8M parameter MLM-U model achieves 100% accuracy on mazes up to 20x20 grid size
→ 4x more data efficient than next-token prediction
→ Outperforms 175M parameter models using A* supervision on 30x30 mazes (85.5% vs 70.2%)
Share this post