0:00
/
0:00
Transcript

"Archon: An Architecture Search Framework for Inference-Time Techniques"

Generated this podcast with Google's Illuminate.

The paper shows the power of systematically combining inference-time techniques for LLMs.

📚 https://arxiv.org/pdf/2409.15254

Original Problem 💡:

Existing inference-time architectures struggle to generalize beyond specific tasks. Challenges include effectively allocating inference compute, understanding interactions between techniques, and efficiently searching the design space.

-----

Solution in this Paper 🔧:

• Introduces Archon framework for combining multiple inference-time techniques

• Defines extensible design space of methods like ensembling, fusion, ranking, critiquing

• Transforms architecture selection into hyperparameter optimization problem

• Proposes Inference-Time Architecture Search (ITAS) algorithms to find optimal configurations

-----

Key Insights from this Paper 💡:

• Layering multiple inference techniques improves performance across tasks

• Fusion and ranking most effective for instruction-following tasks

• Verification and unit testing boost reasoning/coding task performance

• Bayesian optimization efficient for searching architecture configurations

-----

Results 📊:

• Outperforms GPT-4 and Claude 3.5 Sonnet across benchmarks

• Open-source Archon: 11.2 percentage point average increase over baselines

• Closed-source Archon: 15.8 percentage point average increase

• All-source Archon: 15.1 percentage point average increase

Discussion about this video