Bibliography (7):

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
XLNet: Generalized Autoregressive Pretraining for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
A domain-specific supercomputer for training deep neural networks
https://github.com/agemagician/ProtTrans
Wikipedia Bibliography:
1. Summit (supercomputer)