Bibliography (5):

  1. https://openai.com/index/gpt-4-research/

  2. MMLU: Measuring Massive Multitask Language Understanding

  3. GPQA: A Graduate-Level Google-Proof Q&A Benchmark