Bibliography (7):

  1. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

  2. XLNet: Generalized Autoregressive Pretraining for Language Understanding

  3. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

  4. T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

  5. A domain-specific supercomputer for training deep neural networks

  6. https://github.com/agemagician/ProtTrans

  7. Wikipedia Bibliography:

    1. Summit (supercomputer)