Bibliography (7):

  1. RoBERTa: A Robustly Optimized BERT Pretraining Approach

  2. Language Models are Unsupervised Multitask Learners

  3. AraBERT: Transformer-based Model for Arabic Language Understanding

  4. CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters

  5. BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance

  6. Unigram LM: Byte Pair Encoding is Suboptimal for Language Model Pretraining

  7. Unsupervised Cross-lingual Representation Learning at Scale