Here's what a few years of progress on text-to-image generation looks like, one prompt at a time. "Frank Sinatra as a purple alien in surrealist style" 1) AttnGAN (2018) 2) CLIP+VQGAN (2020) 3) CLIP+Diffusion (2021) 4) DALL-E 2 (2022)

Apr 8, 2022 · 7:32 PM UTC

"Three aliens serving pizza at a pop art themed restaurant"
"Desert landscape at sunrise in Studio Ghibli style"
"A teddy bear wearing glasses giving a podcast"
"A gorilla with butterfly wings climbing the Eiffel Tower"
Replying to @genekogan
Generative art AI is accelerating in leaps and bounds. I played around with purple alien Sinatra myself, along with a rendition of what I'd imagine 'Fly me to the moon' looks like. Made with the art generator over at @officialboredAi. Come check it out, no waiting list needed!
Replying to @genekogan
wow love seeing the progression side by side
Replying to @genekogan
Left-to-right, Top-to-bottom: 1) Let me try again (1973) 2) I've got you under my skin (1956) 3) You make me feel so young (1956) 4) The best is yet to come (1964)
Replying to @genekogan
Guessing the DALL-E ones are cherry picked and then the other three run with the same prompts though? That would give an unfair advantage
Replying to @genekogan
Obviously one can argue that even DALL-E isn’t 100% coherent (most notably with text) but the rapid increase in coherence of the produced images is undeniable.