The paper shows the power of systematically combining inference-time techniques for LLMs.
📚 https://arxiv.org/pdf/2409.15254
Original Problem 💡:
Existing inference-time architectures struggle to generalize beyond specific tasks. Challenges include effectively allocating inference compute, understanding interactions between techniques, and efficiently searching the design space.
-----
Solution in this Paper 🔧:
• Introduces Archon framework for combining multiple inference-time techniques
• Defines extensible design space of methods like ensembling, fusion, ranking, critiquing
• Transforms architecture selection into hyperparameter optimization problem
• Proposes Inference-Time Architecture Search (ITAS) algorithms to find optimal configurations
-----
Key Insights from this Paper 💡:
• Layering multiple inference techniques improves performance across tasks
• Fusion and ranking most effective for instruction-following tasks
• Verification and unit testing boost reasoning/coding task performance
• Bayesian optimization efficient for searching architecture configurations
-----
Results 📊:
• Outperforms GPT-4 and Claude 3.5 Sonnet across benchmarks
• Open-source Archon: 11.2 percentage point average increase over baselines
• Closed-source Archon: 15.8 percentage point average increase
• All-source Archon: 15.1 percentage point average increase
Share this post