Bibliography (5):

  1. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

  2. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

  3. RoBERTa: A Robustly Optimized BERT Pretraining Approach

  4. DeBERTa: Decoding-enhanced BERT with Disentangled Attention

  5. Wikipedia Bibliography:

    1. Byte pair encoding