Detect AI model's memory leaks by measuring probability landscape sharpness
This paper introduces a geometric framework to detect memorization in diffusion models by analyzing Hessian eigenvalues of log probability density. Sharp peaks in probability landscapes, indicated by large negative eigenvalues, reveal memorized content.
-----
https://arxiv.org/abs/2412.04140
🔍 Original Problem:
→ Diffusion models can memorize training data instead of generalizing, risking privacy violations and model reliability.
→ Existing detection methods are computationally intensive and lack theoretical foundations.
-----
🛠️ Solution in this Paper:
→ The framework analyzes Hessian eigenvalues of log probability density to detect memorization.
→ Large negative eigenvalues indicate sharp peaks in probability landscapes, signaling memorized content.
→ The number of positive eigenvalues quantifies memorization degree and distinguishes between template and matching verbatim cases.
→ For high-dimensional models like Stable Diffusion, Arnoldi iteration efficiently computes eigenvalues without explicit Hessian formation.
-----
💡 Key Insights:
→ Memorization creates isolated points in learned probability distributions
→ Sharp probability landscapes correlate with memorized content
→ Early detection possible through eigenvalue analysis
→ Different verbatim types show distinct eigenvalue patterns
-----
📊 Results:
→ Successfully detected memorization in 2D Gaussian, MNIST, and Stable Diffusion models
→ Distinguished between matching verbatim (exact copies) and template verbatim (style copies)
→ Identified memorization at early sampling stages with high precision
Share this post