Replace expensive memorization scores with efficient proxies to make machine unlearning practical
Make models forget data efficiently using memorization proxies, and scale up machine unlearning by approximating memorization scores
https://arxiv.org/abs/2410.16516
🎯 Original Problem:
Machine unlearning (MUL) algorithms that rely on memorization scores to remove specific data from trained models face severe scalability issues. Computing these scores is extremely time-consuming, limiting their practical use in real-world applications.
-----
🔧 Solution in this Paper:
→ Introduces memorization-score proxies like confidence scores, binary accuracy, and holdout retraining that can be computed much more efficiently
→ Integrates these proxies into the RUM (Refinement-based Unlearning Meta-algorithm) framework which works in two steps: first partitioning forget set into subsets based on proxy scores, then sequentially applying unlearning algorithms
→ Uses three key proxies: confidence (softmax probability), binary accuracy (correct prediction indicator), and holdout retraining (model prediction changes after fine-tuning)
-----
💡 Key Insights:
→ Memorization proxies can achieve similar unlearning performance while being up to 99.98% faster to compute than actual memorization scores
→ Holdout retraining proxy is particularly efficient as it requires no intervention during model training
→ Data augmentation reduces the effectiveness of holdout retraining proxy
-----
📊 Results:
→ Achieved 30% improvement in accuracy and 46% improvement in privacy compared to baseline methods
→ Reduced computational time by 99.98% compared to computing exact memorization scores
→ Maintained performance across multiple datasets (CIFAR-10, CIFAR-100, Tiny-ImageNet) and architectures (ResNet-18, ResNet-50, VGG-16)
Share this post