Bibliography (4):
LLaMA-2: Open Foundation and Fine-Tuned Chat Models
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
https://github.com/jzhang38/TinyLlama
Wikipedia Bibliography:
GitHub