Bibliography (5):

  1. https://curvy-check-498.notion.site/Process-Reinforcement-through-Implicit-Rewards-15f4fcb9c42180f1b498cc9b2eaf896f

  2. Measuring Mathematical Problem Solving With the MATH Dataset

  3. Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations

  4. Wikipedia Bibliography:

    1. Monte Carlo tree search

    2. Cross-entropy