Today I released the code used to train Anim·E #Anim_E, Anime-enhanced #dallemini
It’s easy train, and dramatically improves dalle-mini’s ability to generate anime images.
With a few lines of code change, I also applied it to stable-diffusion. Here is how it works:🧵
Sep 12, 2022 · 4:09 PM UTC
Dalle-mini (and Dall·E 1) works like this: text tokens-> [transformer] -> image tokens -> [VQGAN decoder] -> image
But the @Craiyon has been updating the transformer model only. The VQGAN was trained on a relatively small dataset and not updated.
So, by updating the VQGAN decoder, we can unleash the true capability of the transformer model.
The code is written in Jax. Using the model implementations from @borisdayma @psuraj28 @pcuenq. Thanks!
For #stabledifussion , the VAE decoder is a lot better at reconstructing arbitrary images, thanks to the much smaller compression ratio, v.s. VQGAN. But it can still benefit from additional fine-tuning. Especially when fine-tuning it with textual-inversion or deambooth.