Gene Kogan · Apr 8, 2022 · 7:32 PM UTC

Gene Kogan · Apr 8, 2022 · 7:32 PM UTC

Gene Kogan

8 Apr 2022

Here's what a few years of progress on text-to-image generation looks like, one prompt at a time. "Frank Sinatra as a purple alien in surrealist style" 1) AttnGAN (2018) 2) CLIP+VQGAN (2020) 3) CLIP+Diffusion (2021) 4) DALL-E 2 (2022)

Apr 8, 2022 · 7:32 PM UTC

314

Gene Kogan · Apr 8, 2022 · 7:32 PM UTC

Gene Kogan

@genekogan

8 Apr 2022

"Three aliens serving pizza at a pop art themed restaurant"

Gene Kogan · Apr 8, 2022 · 7:32 PM UTC

Gene Kogan

@genekogan

8 Apr 2022

"Desert landscape at sunrise in Studio Ghibli style"

Gene Kogan · Apr 8, 2022 · 7:32 PM UTC

Gene Kogan

@genekogan

8 Apr 2022

"A teddy bear wearing glasses giving a podcast"

Gene Kogan · Apr 8, 2022 · 7:32 PM UTC

Gene Kogan

@genekogan

8 Apr 2022

"A gorilla with butterfly wings climbing the Eiffel Tower"

David Azaraf · Apr 10, 2022 · 12:26 PM UTC

David Azaraf

@DAzaraf

10 Apr 2022

Replying to @genekogan

Generative art AI is accelerating in leaps and bounds. I played around with purple alien Sinatra myself, along with a rendition of what I'd imagine 'Fly me to the moon' looks like. Made with the art generator over at @officialboredAi. Come check it out, no waiting list needed!

Gaby Re · Apr 8, 2022 · 7:56 PM UTC

Gaby Re @gabyyyre

8 Apr 2022

Replying to @genekogan

wow love seeing the progression side by side

James · Apr 9, 2022 · 8:46 AM UTC

James @scruffthink

9 Apr 2022

Replying to @genekogan

Left-to-right, Top-to-bottom: 1) Let me try again (1973) 2) I've got you under my skin (1956) 3) You make me feel so young (1956) 4) The best is yet to come (1964)

Peter Bevan · Apr 9, 2022 · 6:30 AM UTC

Peter Bevan @peter_bevan_

9 Apr 2022

Replying to @genekogan

Guessing the DALL-E ones are cherry picked and then the other three run with the same prompts though? That would give an unfair advantage

gdsimms · Apr 8, 2022 · 8:00 PM UTC

gdsimms @gdsimms

8 Apr 2022

Replying to @genekogan

Obviously one can argue that even DALL-E isn’t 100% coherent (most notably with text) but the rapid increase in coherence of the produced images is undeniable.

Namnezia⚡️🥑 · Apr 12, 2022 · 11:25 AM UTC

Namnezia⚡️🥑 @Namnezia

12 Apr 2022

Replying to @genekogan @Pascallisch

Peaked in 2021