0:00
/
0:00
Transcript

"What Do Machine Learning Researchers Mean by "Reproducible"?"

The podcast on this paper is generated with Google's Illuminate.

No more confusion: Eight concrete ways to measure ML research quality.

This paper addresses the confusion in Machine Learning research regarding "reproducibility" by proposing eight distinct aspects of scientific rigor. The authors analyze 101 papers since 2017 to categorize and define these aspects, showing how they interact and influence each other.

-----

https://arxiv.org/abs/2412.03854

🔍 Original Problem:

The ML community faces a "reproducibility crisis" with unclear terminology and inconsistent usage. This makes it difficult to evaluate research rigor and understand what researchers mean when they claim their work is "reproducible".

-----

💡 Solution in this Paper:

→ The paper proposes eight distinct aspects of scientific rigor: repeatability, reproducibility, replicability, adaptability, model selection, label/data quality, meta & incentive, and maintainability

→ Each aspect is defined and categorized based on its primary concern and percentage of papers focused on it

→ The authors establish relationships between these aspects, showing how they influence and depend on each other

→ The solution connects historical works that tackled these issues before the term "reproducibility" became prominent

-----

🎯 Key Insights:

→ Many historical works addressed reproducibility issues without using that specific terminology

→ Current ACM terminology of Repeatability, Reproducibility, and Replicability is insufficient

→ Label/data quality and adaptability are understudied areas with only 4% of papers each

→ Maintainability involves both instantaneous repeatability and replicability over time

-----

📊 Results:

→ Analysis covered 101 papers published since 2017

→ Model selection had highest focus at 19.8% of papers

→ Reproducibility focused papers at 15.8%

→ Repeatability papers at 12.9%

Discussion about this video