Bibliography (3):
MMLU: Measuring Massive Multitask Language Understanding
https://github.com/EQ-bench/EQ-Bench
EQ-Bench 3 Leaderboard