Bibliography (5):

  1. Neural Ordinary Differential Equations

  2. Linear Transformers Are Secretly Fast Weight Programmers

  3. Attention Is All You Need