Bibliography (4):

  1. LLaMA-2: Open Foundation and Fine-Tuned Chat Models

  2. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

  3. https://github.com/jzhang38/TinyLlama

  4. Wikipedia Bibliography:

    1. GitHub