Bibliography (3):

  1. GPT-3: Language Models are Few-Shot Learners

  2. MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism

  3. MMLU: Measuring Massive Multitask Language Understanding