Teaching small AI to catch big AI's memory leaks.
MemHunter introduces an automated system to detect when LLMs memorize training data, making privacy risk assessment scalable across large datasets.
-----
https://arxiv.org/abs/2412.07261
Original Problem 🔍:
→ Current methods to detect LLM memorization are inefficient, requiring per-sample optimization and manual intervention
→ Existing approaches can't effectively assess privacy risks across large datasets
→ Traditional methods only look for exact matches, missing partial memorization that could still leak sensitive information
-----
Solution in this Paper 🛠️:
→ MemHunter uses a tiny LLM to generate memory-inducing prompts automatically
→ The system employs the Longest Common Substring approach to detect partial matches and assess memorization risks
→ It uses hypothesis testing to verify memorization at dataset scale
→ The framework iteratively refines prompts through rejection sampling and fine-tuning
-----
Key Insights 💡:
→ Memorization detection should consider partial matches, not just exact copies
→ Dataset-level verification is crucial for real-world privacy assessment
→ Using a smaller LLM for prompt generation makes the process scalable
-----
Results 📊:
→ Extracts 40% more training data than existing methods under time constraints
→ Reduces search time by 80% when used as a plug-in
→ Achieves up to 92% accuracy in memorization detection on Vicuna-7B
→ Successfully differentiates between trained and untrained models with 95% confidence
Share this post