Denoising diffusion probabilistic models (DDPM) are a class of generative models which have recently been shown to produce excellent samples.
We show that with a few simple modifications, DDPMs can also achieve competitive log-likelihoods while maintaining high sample quality. Additionally, we find that learning variances of the reverse diffusion process allows sampling with an order of magnitude fewer forward passes with a negligible difference in sample quality, which is important for the practical deployment of these models. We additionally use precision and recall to compare how well DDPMs and GANs cover the target distribution.
Finally, we show that the sample quality and likelihood of these models scale smoothly with model capacity and training compute, making them easily scalable.
Figure 10: FID and validation NLL [negative log likelihood] throughout training on ImageNet 64 × 64 for different model sizes. The constant for the FID trend line was approximated using the FID of in-distribution data. For the NLL trend line, the constant was approximated by rounding down the current state-of-the-art NLL (Royet al2020) on this dataset.
…Figure 10 shows how FID and NLL improve relative to theoretical training compute.3 The FID curve looks ~linear on a log-log plot, suggesting that FID scales according to a power law (plotted as the black dashed line). The NLL curve does not fit a power law as cleanly, suggesting that validation NLL scales in a less-favorable manner than FID. This could be caused by a variety of factors, such as (1) an unexpectedly high irreducible loss (Henighanet al2020) for this type of diffusion model, or (2) the model overfitting to the training distribution. We also note that these models do not achieve optimal log-likelihoods in general because they were trained with our Lhybrid objective and not directly with Lvlb to keep both good log-likelihoods and sample quality.
[In light of Chinchilla & related scaling work, the ‘bending’ here more likely reflects suboptimal scaling of their width vs depth multiplier, learning rate schedule etc.]