BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
A domain-specific supercomputer for training deep neural networks
Long Range Arena (LRA): A Benchmark for Efficient Transformers
Wikipedia Bibliography: