Tree of Thought gets smarter with BPP-Search's triple-powered decision making
BPP-Search introduces a novel approach to transform natural language into mathematical models by combining Tree of Thought reasoning with reinforcement learning. The paper addresses the lack of detailed annotations in existing operations research datasets by introducing StructuredOR dataset and enhances reasoning accuracy through a unique combination of Beam Search, Process Reward Model, and Pairwise Preference algorithm.
-----
https://arxiv.org/abs/2411.17404
🤔 Original Problem:
→ Current operations research datasets lack detailed annotations of modeling processes, focusing only on objective values
→ Existing reasoning methods like Chain-of-Thought and Tree-of-Thought struggle with accuracy and efficiency in mathematical modeling
-----
🔧 Solution in this Paper:
→ Introduces StructuredOR dataset with comprehensive annotations capturing complete mathematical modeling process
→ Develops BPP-Search algorithm combining three key components: Beam Search for efficient tree exploration, Process Reward Model for scoring intermediate steps, and Pairwise Preference algorithm for candidate ranking
→ Implements Random Greedy algorithm to handle scoring imprecision in Process Reward Model
-----
💡 Key Insights:
→ Manual labeling of process data proves more effective than Monte Carlo Tree Search methods
→ Pairwise comparison outperforms individual scoring for similar candidates
→ Combining multiple evaluation methods reduces bias in final solution selection
-----
📊 Results:
→ BPP-Search achieves 93.3% accuracy on StructuredOR dataset, compared to 63.3% for Chain-of-Thought
→ Reduces reasoning steps from 39 to 15 while maintaining higher accuracy
→ Outperforms baseline methods across all three datasets: StructuredOR, NL4OPT, and MAMO-ComplexLP
Share this post