Not all examples are created equal - Mixtures of In-Context Learners (MoICL) knows which ones matter
MoICL turns demonstration subsets into weighted experts, making ICL smarter and more efficient.
Split your demonstrations, weight them right, and watch your LLM perform better
https://arxiv.org/abs/2411.02830
🎯 Original Problem:
→ Traditional In-context Learning (ICL) has major limitations: it can't differentiate between demonstrations, wastes memory with long sequences, and is very sensitive to demonstration choice.
-----
🔧 Solution in this Paper:
→ Mixtures of In-Context Learners (MoICL) splits demonstrations into smaller expert groups and learns optimal weights for each group
→ Uses gradient-based optimization to identify both helpful experts and harmful anti-experts
→ Can dynamically generate weights using a hyper-network for unseen demonstrations
→ Allows sparsification by selecting top-k experts to reduce computation
-----
💡 Key Insights:
→ Negative weights help identify and leverage anti-experts for better predictions
→ Smaller demonstration subsets with learned weights outperform single large contexts
→ Dynamic weight generation enables handling unseen demonstrations effectively
→ Sparsification maintains performance while reducing computational costs
-----
📊 Results:
→ Improved accuracy on 5/7 classification datasets (up to +13% vs ICL)
→ Better handles out-of-domain data (+11%), imbalanced data (+49%), noisy demonstrations (+38%)
→ Maintains performance while using fewer demonstrations and less memory
Share this post