Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning
Diffusion-based approach beats autoregressive models at solving puzzles and planning
Diffusion-based approach beats autoregressive models at solving puzzles and planning
🤖 Original Problem:
Autoregressive LLMs struggle with complex reasoning and long-term planning tasks despite their impressive capabilities. They have inherent difficulties maintaining global coherence and handling deliberate planning scenarios.
🔧 Solution in this Paper:
• Introduces Multi-granularity Diffusion Modeling (MDM) that prioritizes subgoals based on difficulty during learning
• Uses a multi-view learning framework where challenging subgoals are decomposed into manageable interrelated views
• Implements sequence-level and token-level reweighting mechanisms to enhance training efficiency
• Employs an easy-first TopK decoding strategy for superior performance
💡 Key Insights:
• Not all tokens are equally difficult to learn in autoregressive models
• Diffusion models can effectively learn difficult subgoals that elude autoregressive approaches
• The performance gap between MDM and autoregressive models widens as task difficulty increases
• Global coherence is better maintained through multi-step denoising process
📊 Results:
• On Countdown task: MDM achieves 91.5% accuracy vs 45.8% for autoregressive models
• On Sudoku: MDM reaches 100% accuracy vs 20.7% for autoregressive models
• With just 6M parameters, MDM outperforms 303M parameter GPT-2 and 13B parameter LLaMA
• 10x faster inference with single diffusion step while maintaining superior accuracy
The way the model handles the subgoal imbalance problem
A multi-granularity loss function incorporating sequence and token reweighting
Decomposing difficult subgoals into multiple manageable views
Prioritizing different subgoals based on their difficulty during training