Today I released the code used to train Anim·E #Anim_E, Anime-enhanced #dallemini It’s easy train, and dramatically improves dalle-mini’s ability to generate anime images. With a few lines of code change, I also applied it to stable-diffusion. Here is how it works:🧵

Sep 12, 2022 · 4:09 PM UTC

Dalle-mini (and Dall·E 1) works like this: text tokens-> [transformer] -> image tokens -> [VQGAN decoder] -> image But the @Craiyon has been updating the transformer model only. The VQGAN was trained on a relatively small dataset and not updated.
So, by updating the VQGAN decoder, we can unleash the true capability of the transformer model. The code is written in Jax. Using the model implementations from @borisdayma @psuraj28 @pcuenq. Thanks!
For #stabledifussion , the VAE decoder is a lot better at reconstructing arbitrary images, thanks to the much smaller compression ratio, v.s. VQGAN. But it can still benefit from additional fine-tuning. Especially when fine-tuning it with textual-inversion or deambooth.
Tip: When training discriminator, try discretize fake images to int then back to float, so discriminator doesn't learn to exploit the leaked information.