Bibliography (5):
DeepSeek-V3 Technical Report
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
MMLU: Measuring Massive Multitask Language Understanding
https://openai.com/index/gpt-4-research/
Wikipedia Bibliography:
Reinforcement learning