FasterVLM introduces a training-free method to speed up Vision Language Models by pruning visual tokens using [CLS] attention, maintaining 90% performance while reducing computation by 95%.
Share this post
"[CLS] Attention is All You Need for…
Share this post
FasterVLM introduces a training-free method to speed up Vision Language Models by pruning visual tokens using [CLS] attention, maintaining 90% performance while reducing computation by 95%.