0:00
/
0:00
Transcript

"Scalability of memorization-based machine unlearning"

The podcast on this paper is generated with Google's Illuminate.

Replace expensive memorization scores with efficient proxies to make machine unlearning practical

Make models forget data efficiently using memorization proxies, and scale up machine unlearning by approximating memorization scores

https://arxiv.org/abs/2410.16516

🎯 Original Problem:

Machine unlearning (MUL) algorithms that rely on memorization scores to remove specific data from trained models face severe scalability issues. Computing these scores is extremely time-consuming, limiting their practical use in real-world applications.

-----

🔧 Solution in this Paper:

→ Introduces memorization-score proxies like confidence scores, binary accuracy, and holdout retraining that can be computed much more efficiently

→ Integrates these proxies into the RUM (Refinement-based Unlearning Meta-algorithm) framework which works in two steps: first partitioning forget set into subsets based on proxy scores, then sequentially applying unlearning algorithms

→ Uses three key proxies: confidence (softmax probability), binary accuracy (correct prediction indicator), and holdout retraining (model prediction changes after fine-tuning)

-----

💡 Key Insights:

→ Memorization proxies can achieve similar unlearning performance while being up to 99.98% faster to compute than actual memorization scores

→ Holdout retraining proxy is particularly efficient as it requires no intervention during model training

→ Data augmentation reduces the effectiveness of holdout retraining proxy

-----

📊 Results:

→ Achieved 30% improvement in accuracy and 46% improvement in privacy compared to baseline methods

→ Reduced computational time by 99.98% compared to computing exact memorization scores

→ Maintained performance across multiple datasets (CIFAR-10, CIFAR-100, Tiny-ImageNet) and architectures (ResNet-18, ResNet-50, VGG-16)

Discussion about this video

User's avatar