Bibliography (6):

  1. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

  2. https://github.com/microsoft/LAVENDER

  3. VL-T5: Unifying Vision-and-Language Tasks via Text Generation

  4. UNICORN: Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling

  5. OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

  6. T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer