Bibliography (10):

  1. https://x.com/rm_rafailov/status/1781145338759533016

  2. V-STaR: Training Verifiers for Self-Taught Reasoners

  3. Diffusion Model Alignment Using Direct Preference Optimization