https://ml-gsai.github.io/LLaDA-demo/
https://github.com/ML-GSAI/LLaDA
Structured Denoising Diffusion Models in Discrete State-Spaces
RADD: Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data
Attention Is All You Need
The Reversal Curse: LLMs trained on A-is-B fail to learn B-is-A
https://openai.com/index/hello-gpt-4o/