Bibliography (5):

https://curvy-check-498.notion.site/Process-Reinforcement-through-Implicit-Rewards-15f4fcb9c42180f1b498cc9b2eaf896f
Measuring Mathematical Problem Solving With the MATH Dataset
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations
Wikipedia Bibliography:
1. Monte Carlo tree search
2. Cross-entropy