Economical Prompting Index (EPI) helps companies choose between expensive-but-accurate and cheap-but-decent prompting methods.
This paper introduces the Economical Prompting Index (EPI), a novel metric that balances accuracy with token usage in LLM prompting. EPI helps organizations optimize their prompting strategies by considering both performance and cost, addressing the critical gap in current evaluation methods that focus solely on accuracy.
-----
https://arxiv.org/abs/2412.01690
Original Problem 🤔:
Current prompt engineering research overemphasizes accuracy gains while ignoring computational costs. This leads to techniques that may be too expensive for practical use, despite marginal accuracy improvements.
-----
Solution in this Paper 💡:
→ EPI combines accuracy scores with token consumption through a user-specified cost concern factor.
→ The metric uses an exponential formula: EPI = A × e^(-C × T), where A is accuracy, C is cost concern, and T is token count.
→ Five cost concern levels are defined, from "None" (research) to "Major" (highly cost-sensitive).
→ The study evaluates 6 prompting techniques across 10 LLMs and 4 datasets.
-----
Key Insights 🔍:
→ Complex techniques like Self-Consistency often provide minimal gains at high costs - 6.74% accuracy increase with 200% more tokens
→ Simple methods like Chain-of-Thought become more viable as cost concerns increase
→ High-performing models show diminishing returns with complex prompting
-----
Results 📊:
→ Self-Consistency achieves highest accuracy (0.88-0.95) but uses 3x more tokens
→ Chain-of-Thought maintains strong performance (0.74-0.89) with moderate token usage
→ On Claude 3.5 Sonnet, simpler techniques overtake complex ones at C=0.00008
Share this post