“Diffusion Models Beat GANs on Image Synthesis”, 2021-05-11 (; similar):
We show that diffusion models can achieve image sample quality superior to the current state-of-the-art generative models.
We achieve this on unconditional image synthesis by finding a better architecture through a series of ablations. For conditional image synthesis, we further improve sample quality with classifier guidance: a simple, compute-efficient method for trading off diversity for sample quality using gradients from a classifier.
We achieve an FID of 2.97 on ImageNet 128×128, 4.59 on ImageNet 256×256, and 7.72 on ImageNet 512×512, and we match BigGAN-deep even with as few as 25 forward passes per sample, all while maintaining better coverage of the distribution. [nearest-neighbors]
Finally, we find that classifier guidance combines well with upsampling diffusion models, further improving FID to 3.85 on ImageNet 512×512.
We release our code [and checkpoints].