Bibliography (5):
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Attention Is All You Need
MMLU: Measuring Massive Multitask Language Understanding
MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism
Wikipedia Bibliography:
Computational complexity theory