Bibliography (6):

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Measuring Mathematical Problem Solving With the MATH Dataset
https://openai.com/index/gpt-4-research/
https://openai.com/blog/chatgpt/
LLaMa-1: Open and Efficient Foundation Language Models
https://github.com/GanjinZero/math401-llm