-
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
-
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
-
RoBERTa: A Robustly Optimized BERT Pretraining Approach
-
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
-