Bibliography (20):

  1. RWKV: Reinventing RNNs for the Transformer Era

  2. https://mattmahoney.net/dc/textdata.html

  3. https://openwebtext2.readthedocs.io/en/latest/

  4. Surrogate Gradient Learning in Spiking Neural Networks

  5. Deep Residual Learning in Spiking Neural Networks

  6. https://github.com/ridgerchu/SpikeGPT

  7. Reformer: The Efficient Transformer

  8. Synthesizer: Rethinking Self-Attention in Transformer Models

  9. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

  10. Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers

  11. Generating Sequences With Recurrent Neural Networks

  12. Single Headed Attention RNN: Stop Thinking With Your Head

  13. Attention Is All You Need

  14. https://github.com/idiap/fast-transformers