Bibliography (4):

  1. https://www.reddit.com/r/mlscaling/comments/17rgsg5/cogvlm_visual_expert_for_pretrained_language/

  2. Microsoft COCO: Common Objects in Context

  3. PaLI-X: On Scaling up a Multilingual Vision and Language Model

  4. https://github.com/THUDM/CogVLM