"Stealing User Prompts from Mixture of Experts"

Playback speed

Share post at current time

0:00

Transcript

"Stealing User Prompts from Mixture of Experts"

The podcast on this paper is generated with Google's Illuminate.

Rohan Paul

Jan 03, 2025

Nice paper from @GoogleDeepMind

When models share work, they accidentally share your secrets too.

MoE models can leak user prompts through expert routing vulnerabilities in batched processing.

Expert-Choice-Routing and Token dropping in MoE creates a backdoor to steal user inputs

📚 https://arxiv.org/abs/2410.22884

🎯 Original Problem:

Mixture-of-Experts (MoE) models, while efficient for LLMs, have a critical vulnerability in their token routing mechanism that could expose user data when queries are batched together.

-----

🛠️ Solution in this Paper:

→ Introduces "MoE Tiebreak Leakage" attack that exploits Expert-Choice-Routing to leak victim's prompts

→ Uses strategic batch crafting to manipulate expert routing and force specific token dropping

→ Exploits tie-handling behavior in torch.topk CUDA implementation

→ Requires whitebox access to model and ability to control batch placement

→ Implements attack in two variants: Oracle Attack (2 queries) and Leakage Attack (iterative extraction)

-----

💡 Key Insights:

→ Cross-batch dependencies in MoE create exploitable side channels

→ Optimization for efficiency can introduce security vulnerabilities

→ Token dropping, meant for efficiency, becomes security risk

→ Need for rigorous security testing in architectural optimizations

-----

📊 Results:

→ Successfully extracted 996 out of 1000 secret messages

→ Recovered 4,833 out of 4,838 total secret tokens

→ Requires ~100 queries per token on average

→ Works optimally with padding sequence length of 40

→ Shows 99.9% success rate using all 8 experts

Rohan's Bytes

"Stealing User Prompts from Mixture of Experts"

Discussion about this video