Bibliography (8):

  1. https://github.com/google-research/meliad#block-recurrent-transformer

  2. Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

  3. Compressive Transformers for Long-Range Sequence Modeling

  4. Attention Is All You Need