Bibliography (5):

  1. Mamba: Linear-Time Sequence Modeling with Selective State Spaces

  2. Attention Is All You Need

  3. MMLU: Measuring Massive Multitask Language Understanding

  4. MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism

  5. Wikipedia Bibliography:

    1. Computational complexity theory