-
RWKV: Reinventing RNNs for the Transformer Era
-
https://mattmahoney.net/dc/textdata.html
-
https://openwebtext2.readthedocs.io/en/latest/
-
Surrogate Gradient Learning in Spiking Neural Networks
-
Deep Residual Learning in Spiking Neural Networks
-
https://github.com/ridgerchu/SpikeGPT
-
Reformer: The Efficient Transformer
-
Synthesizer: Rethinking Self-Attention in Transformer Models
-
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
-
Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers
-
Generating Sequences With Recurrent Neural Networks
-
Single Headed Attention RNN: Stop Thinking With Your Head
-
Attention Is All You Need
-
https://github.com/idiap/fast-transformers
-