Bibliography (10):

BlinkDL/RWKV-LM: RWKV Is an RNN With Transformer-Level LLM Performance. It Can Be Directly Trained like a GPT (Parallelizable). So It’s Combining the Best of RNN and Transformer—Great Performance, Fast Inference, Saves VRAM, Fast Training, "Infinite" Ctx_len, and Free Sentence Embedding.
https://www.reddit.com/r/MachineLearning/comments/umq908/r_rwkvv2rnn_a_parallelizable_rnn_with/
https://www.reddit.com/r/MachineLearning/comments/yxt8sa/r_rwkv4_7b_release_an_attentionfree_rnn_language/
https://news.ycombinator.com/item?id=36039375
https://x.com/arankomatsuzaki/status/1639000379978403853
Attention Is All You Need
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Wikipedia Bibliography: