Greedy token selection isn't just simple; this paper proves it's the smartest way to hit LLM performance targets on a budget.
Greedy algorithm optimally selects tokens to meet request-specific performance targets under a total budget constraint.
-----
Paper - https://arxiv.org/abs/2501.12162
Original Problem 🤔:
→ Efficiently allocating a limited token budget across multiple requests is challenging.
→ Each request has a Service Level Objective (SLO) defined by a target path probability.
→ The problem is to maximize the satisfaction of SLOs within the given token budget.
-----
Solution in this Paper 💡:
→ This paper proposes and proves the optimality of a greedy algorithm.
→ The algorithm iteratively selects tokens from token trees associated with each request.
→ It prioritizes tokens with the highest path probability values across all requests.
→ For each request, the algorithm aims to achieve its target path probability (SLO).
→ Token selection continues until either all SLOs are met or the total budget is exhausted.
→ The algorithm first checks for feasibility. If the total minimum tokens needed to meet all SLOs exceeds the budget, it declares "INVALID".
→ Otherwise, the algorithm proceeds with greedy token selection to find an optimal allocation within the budget.
-----
Key Insights from this Paper 🧐:
→ A greedy approach is optimal for token selection in this scenario.
→ Selecting tokens based on highest path probability maximizes the achieved total probability within a budget.
→ The minimal number of tokens required to reach a probability threshold can be found greedily.
→ Greedy selection under a fixed budget maximizes the sum of path probabilities.
-----
Results 📊:
→ Proves that if the algorithm returns "INVALID", no feasible solution exists to meet all SLOs within the budget.
→ Proves that if a feasible solution exists, the algorithm returns an optimal solution.
→ Optimality is shown through two lemmas: Minimality in Threshold Attainment and Maximality Under a Fixed Budget.
Share this post