"Current Pathology Foundation Models are unrobust to Medical Center Differences"
Below podcast on this paper is generated with Google's Illuminate.
https://arxiv.org/abs/2501.18055
The paper addresses the issue that current pathology Foundation Models are not robust to variations arising from different medical centers. This lack of robustness stems from models learning confounding medical center signatures instead of just biological features.
This paper introduces the Robustness Index to quantify how well biological features dominate confounding medical center features in pathology FMs.
-----
📌 The Robustness Index offers a practical metric to evaluate pathology Foundation Models before clinical deployment. It highlights a critical flaw: models are learning medical center biases, not just biology.
📌 By quantifying and visualizing medical center influence, this paper enables targeted improvements in pathology Foundation Model training. Future models can be designed for improved out-of-distribution generalization across diverse centers.
📌 The finding that medical center confounds even logistic regression underscores the deep entanglement of technical variation and biological signal in current pathology Foundation Models. This necessitates robustness as a core design principle.
----------
Methods Explored in this Paper 🔧:
→ The paper introduces the Robustness Index.
→ Robustness Index is a metric to measure a pathology FM's robustness to medical center variations.
→ It is calculated as the ratio of same biological class neighbors to same medical center neighbors in the embedding space.
→ A dataset named TCGA-2k was created.
→ TCGA-2k includes 2000 patches from five cancer types and five medical centers.
→ Ten publicly available pathology FMs were evaluated using this dataset.
→ K-Nearest Neighbors and t-SNE visualization were used to analyze the embedding spaces of these models.
-----
Key Insights 💡:
→ Current pathology FMs are significantly influenced by medical center variations.
→ Medical center information often dominates biological signals like cancer type in the embedding space.
→ This sensitivity to medical centers leads to classification errors attributable to same-center confounders.
→ The Robustness Index varies significantly across different FMs.
→ Only one model, Virchow2, shows a Robustness Index slightly greater than one.
-----
Results 📊:
→ For Phikon-v2 model, incorrectly classified neighbors are from the same center in over 95% of cases.
→ EXAONEPath, Phikon, and Phikon-v2 achieve medical center prediction accuracies using logistic regression of 0.987, 0.987, and 0.993 respectively.
→ Virchow2 is the only model with a robustness index above one at 1.2.