Bibliography (14):

  1. Towards End-to-End In-Image Neural Machine Translation

  2. Towards Fully Automated Manga Translation

  3. One Big Net For Everything

  4. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

  5. MAE: Masked Autoencoders Are Scalable Vision Learners

  6. Robust Open-Vocabulary Translation from Visual Text Representations

  7. Attention Is All You Need

  8. Vision Transformer: An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale

  9. PALM: Pre-training an Autoencoding & Autoregressive Language Model for Context-conditioned Generation

  10. M3AE: Multimodal Masked Autoencoders Learn Transferable Representations

  11. StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding

  12. Building Machine Translation Systems for the Next Thousand Languages