Bibliography (6):

  1. GPV-1: Towards General Purpose Vision Systems

  2. VL-T5: Unifying Vision-and-Language Tasks via Text Generation

  3. Microsoft COCO: Common Objects in Context

  4. https://storage.googleapis.com/openimages/web/index.html

  5. https://visualgenome.org/