Bibliography (4):

  1. https://arxiv.org/abs/2303.08774

  2. https://openai.com/index/gpt-4-research/

  3. MMLU: Measuring Massive Multitask Language Understanding

  4. Wikipedia Bibliography:

    1. Calibration (statistics)