Bibliography (14):

Towards End-to-End In-Image Neural Machine Translation
Towards Fully Automated Manga Translation
One Big Net For Everything
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
MAE: Masked Autoencoders Are Scalable Vision Learners
Robust Open-Vocabulary Translation from Visual Text Representations
Attention Is All You Need
Vision Transformer: An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale
PALM: Pre-training an Autoencoding & Autoregressive Language Model for Context-conditioned Generation
M3AE: Multimodal Masked Autoencoders Learn Transferable Representations
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding
Building Machine Translation Systems for the Next Thousand Languages
Wikipedia Bibliography:
1. Autoencoder
2. https://en.wikipedia.org/wiki/Machine_translation :
  
  https://en.wikipedia.org/wiki/Machine_translation