0:00
/
0:00
Transcript

"Dataset Distillation via Committee Voting"

Generated below podcast on this paper with Google's Illuminate.

Multiple models voting together create better synthetic datasets than single expert opinions

CV-DD introduces a committee-based dataset distillation method that combines multiple expert models' knowledge to create compact, high-quality training datasets.

-----

https://arxiv.org/abs/2501.07575

Original Problem 🤔:

→ Current dataset distillation methods rely on single models, leading to biased and less generalizable synthetic datasets

→ Existing methods struggle with capturing diverse features and often overfit to specific architectures

-----

Solution in this Paper 🔧:

→ CV-DD leverages a committee of diverse models (ResNet18, ResNet50, ShuffleNetV2, MobileNetV2, DenseNet121) to vote on synthetic data generation.

→ Each model's vote is weighted based on its prior performance on target tasks.

→ Batch-Specific Soft Labeling technique aligns synthetic data distribution with real data by computing batch normalization statistics on-the-fly.

→ Dynamic voting mechanism adjusts model contributions based on their expertise in specific domains.

-----

Key Insights from this Paper 💡:

→ Multiple expert perspectives reduce model-specific biases

→ Batch-specific normalization significantly improves generalization

→ Committee size of 2 models achieves optimal performance-efficiency tradeoff

-----

Results 📊:

→ Outperforms SOTA by +3% on ImageNet-1K with 50 IPC (59.5% vs 56.5%)

→ 1.11ms faster per iteration than previous ensemble methods

→ Achieves 67.1% accuracy on CIFAR-100 with 50 IPC

Discussion about this video

User's avatar