"Retaining and Enhancing Pre-trained Knowledge in Vision-Language Models with Prompt Ensembling"

Playback speed

Share post at current time

0:00

Transcript

"Retaining and Enhancing Pre-trained Knowledge in Vision-Language Models with Prompt Ensembling"

Generated below podcast on this paper with Google's Illuminate.

Rohan Paul

Jan 07, 2025

Smart prompt grouping helps vision models learn new domains without memory loss.

Group-wise Prompt Ensemble enhances vision-language models by integrating domain knowledge while preserving zero-shot capabilities through strategic prompt grouping and ensemble learning.

-----

https://arxiv.org/abs/2412.07077

🤔 Original Problem:

Vision-language models like CLIP struggle to maintain zero-shot capabilities when fine-tuned on specialized datasets, leading to performance drops when adapting to specific domains.

-----

🔧 Solution in this Paper:

→ Introduces Group-wise Prompt Ensemble (GPE) with masked attention to optimize adaptability while protecting zero-shot capabilities.

→ Implements auxiliary prompts to seamlessly integrate new domain insights without disrupting original model representation.

→ Uses ensemble learning strategy that combines original and new knowledge by promoting diversity among prompts.

→ Employs covariance regularization to ensure each prompt contributes unique information.

-----

🎯 Key Insights:

→ Prompt grouping with masked attention effectively preserves pre-trained knowledge

→ Auxiliary prompts enhance model adaptation without compromising original capabilities

→ Group-wise ensemble outperforms pair-wise training due to more available classifiers

→ Special tokens play crucial role in improving zero-shot generalization

-----

📊 Results:

→ Outperforms zero-shot CLIP by 1.7% in novel class accuracy

→ Achieves 79.24% harmonic mean across 11 datasets

→ Maintains near zero-shot performance even after fine-tuning on specific domains

Rohan's Bytes

"Retaining and Enhancing Pre-trained Knowledge in Vision-Language Models with Prompt Ensembling"

Discussion about this video