mT5: A massively multilingual pre-trained text-to-text transformer
Unsupervised Neural Machine Translation with Generative Language Models Only
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
Wikipedia Bibliography: