Bibliography:

  1. ​ RoBERTa: A Robustly Optimized BERT Pretraining Approach

  2. ​ Language Models are Unsupervised Multitask Learners

  3. ​ AraBERT: Transformer-based Model for Arabic Language Understanding

  4. ​ CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters

  5. ​ BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance

  6. ​ Unigram LM: Byte Pair Encoding is Suboptimal for Language Model Pretraining

  7. ​ Unsupervised Cross-lingual Representation Learning at Scale