Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning

Diffusion-based approach beats autoregressive models at solving puzzles and planning

Nov 04, 2024

Diffusion-based approach beats autoregressive models at solving puzzles and planning

🤖 Original Problem:

Autoregressive LLMs struggle with complex reasoning and long-term planning tasks despite their impressive capabilities. They have inherent difficulties maintaining global coherence and handling deliberate planning scenarios.

🔧 Solution in this Paper:

• Introduces Multi-granularity Diffusion Modeling (MDM) that prioritizes subgoals based on difficulty during learning

• Uses a multi-view learning framework where challenging subgoals are decomposed into manageable interrelated views

• Implements sequence-level and token-level reweighting mechanisms to enhance training efficiency

• Employs an easy-first TopK decoding strategy for superior performance

💡 Key Insights:

• Not all tokens are equally difficult to learn in autoregressive models

• Diffusion models can effectively learn difficult subgoals that elude autoregressive approaches

• The performance gap between MDM and autoregressive models widens as task difficulty increases

• Global coherence is better maintained through multi-step denoising process

📊 Results:

• On Countdown task: MDM achieves 91.5% accuracy vs 45.8% for autoregressive models

• On Sudoku: MDM reaches 100% accuracy vs 20.7% for autoregressive models

• With just 6M parameters, MDM outperforms 303M parameter GPT-2 and 13B parameter LLaMA

• 10x faster inference with single diffusion step while maintaining superior accuracy

The way the model handles the subgoal imbalance problem

A multi-granularity loss function incorporating sequence and token reweighting
Decomposing difficult subgoals into multiple manageable views
Prioritizing different subgoals based on their difficulty during training

Rohan's Bytes

Discussion about this post