Bibliography (5):

  1. https://www.reddit.com/r/mlscaling/comments/1jobsaq/proof_or_bluff_evaluating_llms_on_2025_usa_math/

  2. https://x.com/mbalunovic/status/1907436704790651166

  3. Gemini 2.5: Our Newest Gemini Model With Thinking