-
https://towardsdatascience.com/your-vision-language-model-might-be-a-bag-of-words-30b1beaef7f8
-
https://x.com/james_y_zou/status/1638947761562476545
-
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
-
Contrastive Representation Learning: A Framework and Review