Bibliography (5):

  1. Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers

  2. Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding

  3. BigBird: Transformers for Longer Sequences

  4. Long Range Arena (LRA): A Benchmark for Efficient Transformers

  5. Vision Transformer: An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale