A probabilistic approach to better measure what LLMs actually remember from training data.
Paper from @GoogleDeepMind
A more realistic way to quantify what secrets LLMs might spill.
📚 https://arxiv.org/abs/2410.19482
🔍 Original Problem:
Current methods to measure memorization in LLMs rely on single-sequence greedy sampling, which underestimates true memorization rates and fails to reflect real-world user interactions where multiple attempts with different sampling strategies are possible.
-----
🛠️ Solution in this Paper:
→ Introduces (n,p)-discoverable extraction - a probabilistic relaxation of discoverable extraction
→ Quantifies probability of extracting target sequence within n attempts with probability p
→ Considers various sampling schemes (top-k, top-p, temperature) instead of just greedy sampling
→ No additional computational cost compared to traditional methods
-----
💡 Key Insights:
→ Greedy sampling misses clear cases of memorization where target sequences have high generation likelihood
→ Larger models and repeated training data show higher memorization rates
→ Gap between greedy and probabilistic extraction rates increases with model size
→ Different sampling strategies significantly impact extraction success rates
-----
📊 Results:
→ Even with modest n=3 attempts and p=10% probability, extraction rates exceed greedy sampling
→ Extraction rates on training data consistently higher than test data across all parameter settings
→ For 12B parameter model, needs only n=40 sequences to match greedy rate at p=90%
→ Gap between greedy and probabilistic rates widens with model size (1B to 12B parameters)
Share this post