Bibliography (3):

  1. MMLU: Measuring Massive Multitask Language Understanding

  2. GPT-3: Language Models are Few-Shot Learners

  3. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models