A benchmark that tests if Vision-Language Models (VLMs) can truly understand and combine visual-text relationships like humans do.
Share this post
MMCOMPOSITION: Revisiting the…
Share this post
A benchmark that tests if Vision-Language Models (VLMs) can truly understand and combine visual-text relationships like humans do.