Bibliography (4):

  1. https://gonzoml.substack.com/p/you-only-cache-once-decoder-decoder

  2. Attention Is All You Need

  3. https://github.com/microsoft/unilm/tree/master/YOCO

  4. Wikipedia Bibliography:

    1. Attention (machine learning)