Bibliography (6):

  1. Mamba: Linear-Time Sequence Modeling with Selective State Spaces

  2. Attention Is All You Need

  3. Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

  4. MMLU: Measuring Massive Multitask Language Understanding

  5. MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism

  6. Wikipedia Bibliography:

    1. Computational complexity theory