Bibliography (3):

  1. MMLU: Measuring Massive Multitask Language Understanding

  2. https://github.com/EQ-bench/EQ-Bench

  3. EQ-Bench 3 Leaderboard