Bibliography (3):
Proximal Policy Optimization Algorithms
Training Verifiers to Solve Math Word Problems
Measuring Mathematical Problem Solving With the MATH Dataset