Bibliography (10):

  1. BlinkDL/RWKV-LM: RWKV Is an RNN With Transformer-Level LLM Performance. It Can Be Directly Trained like a GPT (Parallelizable). So It’s Combining the Best of RNN and Transformer—Great Performance, Fast Inference, Saves VRAM, Fast Training, "Infinite" Ctx_len, and Free Sentence Embedding.

  2. https://www.reddit.com/r/MachineLearning/comments/umq908/r_rwkvv2rnn_a_parallelizable_rnn_with/

  3. https://www.reddit.com/r/MachineLearning/comments/yxt8sa/r_rwkv4_7b_release_an_attentionfree_rnn_language/

  4. https://news.ycombinator.com/item?id=36039375

  5. https://x.com/arankomatsuzaki/status/1639000379978403853

  6. Attention Is All You Need

  7. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention