Bibliography (5):

  1. Attention Is All You Need

  2. Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

  3. Linear Transformers Are Secretly Fast Weight Programmers