-
RoBERTa: A Robustly Optimized BERT Pretraining Approach
-
Language Models are Unsupervised Multitask Learners
-
AraBERT: Transformer-based Model for Arabic Language Understanding
-
CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters
-
BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance
-
Unigram LM: Byte Pair Encoding is Suboptimal for Language Model Pretraining
-
Unsupervised Cross-lingual Representation Learning at Scale