BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
VL-T5: Unifying Vision-and-Language Tasks via Text Generation
UNICORN: Crossing the Format Boundary of Text and Boxes: Towards Unified Vision-Language Modeling
T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer