-
PaLI: A Jointly-Scaled Multilingual Language-Image Model
-
PaLI-X: On Scaling up a Multilingual Vision and Language Model
-
Vision Transformer: An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale
-
Sigmoid Loss for Language Image Pre-Training