Mixtures of In-Context Learners

Playback speed

Share post at current time

0:00

Transcript

Mixtures of In-Context Learners

The podcast on this paper is generated with Google's Illuminate.

Rohan Paul

Dec 30, 2024

Not all examples are created equal - Mixtures of In-Context Learners (MoICL) knows which ones matter

MoICL turns demonstration subsets into weighted experts, making ICL smarter and more efficient.

Split your demonstrations, weight them right, and watch your LLM perform better

https://arxiv.org/abs/2411.02830

🎯 Original Problem:

→ Traditional In-context Learning (ICL) has major limitations: it can't differentiate between demonstrations, wastes memory with long sequences, and is very sensitive to demonstration choice.

-----

🔧 Solution in this Paper:

→ Mixtures of In-Context Learners (MoICL) splits demonstrations into smaller expert groups and learns optimal weights for each group

→ Uses gradient-based optimization to identify both helpful experts and harmful anti-experts

→ Can dynamically generate weights using a hyper-network for unseen demonstrations

→ Allows sparsification by selecting top-k experts to reduce computation

-----

💡 Key Insights:

→ Negative weights help identify and leverage anti-experts for better predictions

→ Smaller demonstration subsets with learned weights outperform single large contexts

→ Dynamic weight generation enables handling unseen demonstrations effectively

→ Sparsification maintains performance while reducing computational costs

-----

📊 Results:

→ Improved accuracy on 5/7 classification datasets (up to +13% vs ICL)

→ Better handles out-of-domain data (+11%), imbalanced data (+49%), noisy demonstrations (+38%)

→ Maintains performance while using fewer demonstrations and less memory

Rohan's Bytes

Mixtures of In-Context Learners

Discussion about this video