"Dataset Distillation via Committee Voting"

Playback speed

Share post at current time

0:00

Transcript

"Dataset Distillation via Committee Voting"

Generated below podcast on this paper with Google's Illuminate.

Rohan Paul

Jan 21, 2025

Multiple models voting together create better synthetic datasets than single expert opinions

CV-DD introduces a committee-based dataset distillation method that combines multiple expert models' knowledge to create compact, high-quality training datasets.

-----

https://arxiv.org/abs/2501.07575

Original Problem 🤔:

→ Current dataset distillation methods rely on single models, leading to biased and less generalizable synthetic datasets

→ Existing methods struggle with capturing diverse features and often overfit to specific architectures

-----

Solution in this Paper 🔧:

→ CV-DD leverages a committee of diverse models (ResNet18, ResNet50, ShuffleNetV2, MobileNetV2, DenseNet121) to vote on synthetic data generation.

→ Each model's vote is weighted based on its prior performance on target tasks.

→ Batch-Specific Soft Labeling technique aligns synthetic data distribution with real data by computing batch normalization statistics on-the-fly.

→ Dynamic voting mechanism adjusts model contributions based on their expertise in specific domains.

-----

Key Insights from this Paper 💡:

→ Multiple expert perspectives reduce model-specific biases

→ Batch-specific normalization significantly improves generalization

→ Committee size of 2 models achieves optimal performance-efficiency tradeoff

-----

Results 📊:

→ Outperforms SOTA by +3% on ImageNet-1K with 50 IPC (59.5% vs 56.5%)

→ 1.11ms faster per iteration than previous ensemble methods

→ Achieves 67.1% accuracy on CIFAR-100 with 50 IPC

Rohan's Bytes

"Dataset Distillation via Committee Voting"

Discussion about this video