Bibliography (3):

  1. Proximal Policy Optimization Algorithms

  2. Training Verifiers to Solve Math Word Problems

  3. Measuring Mathematical Problem Solving With the MATH Dataset