“Palette: Image-To-Image Diffusion Models”, 2021-10-06 (; similar):
We introduce Palette, a simple and general framework for image-to-image translation using conditional diffusion models.
On 4 challenging image-to-image translation tasks (colorization, inpainting, uncropping, and JPEG decompression), Palette outperforms strong GAN and regression baselines, and establishes a new state-of-the-art result. This is accomplished without task-specific hyper-parameter tuning, architecture customization, or any auxiliary loss demonstrating a desirable degree of generality and flexibility.
We uncover the impact of using 𝓁2 vs. 𝓁1 loss in the denoising diffusion objective on sample diversity, and demonstrate the importance of self-attention through empirical architecture studies.
Importantly, we advocate an unified evaluation protocol based on ImageNet, and report several sample quality scores including FID, Inception Score, Classification Accuracy of a pre-trained ResNet-50, and Perceptual Distance against reference images for various baselines. We expect this standardized evaluation protocol to play a critical role in advancing image-to-image translation research.
Finally, we show that a single generalist Palette model trained on 3 tasks (colorization, inpainting, JPEG decompression) performs as well or better than task-specific specialist counterparts.
[Keywords: machine learning, artificial intelligence, computer vision]
View PDF: