Smart LLM routing that learns what works best for your specific needs
PickLLM is a reinforcement learning framework that intelligently routes queries to the most suitable LLM based on cost, latency, and accuracy requirements.
-----
https://arxiv.org/abs/2412.12170
🎯 Original Problem:
Selecting the right LLM for specific tasks is challenging due to varying costs, latencies, and accuracies across models. Existing solutions either focus solely on cost reduction or require expensive pre-training and ensemble approaches.
-----
🔧 Solution in this Paper:
→ PickLLM uses reinforcement learning to dynamically route queries to available LLMs based on a weighted reward function
→ The reward function considers three key metrics: query cost, inference latency, and response accuracy
→ Two learning approaches are implemented: gradient ascent for learning automaton and stateless Q-learning with epsilon-greedy exploration
-----
💡 Key Insights:
→ LLM selection can be optimized without expensive pre-training or computing multiple model outputs
→ Weighted reward function allows flexible optimization based on user priorities
→ Real-time adaptation to query contexts improves efficiency
-----
📊 Results:
→ Reduced session costs by 60% compared to using expensive models
→ Decreased mean latency by 52% versus random model selection
→ Maintained competitive accuracy scores across diverse topics while optimizing for cost and speed
Share this post