Here are some GLIDE demo prompts that I ran with the new 1.45B parameter latent diffusion model from CompVis (github.com/CompVis/latent-diโ€ฆ): "a hedgehog using a calculator" "a corgi wearing a red bowtie and a purple party hat" "robots meditating in a vipassana retreat" "a fall la..."

Apr 4, 2022 ยท 4:02 PM UTC

(The last one was "a fall landscape with a small cottage next to a lake", it wouldn't fit in the tweet)
"a surrealist dream-like oil painting by salvador dali of a cat playing checkers" "a professional photo of a sunset behind the grand canyon" "a high-quality oil painting of a psychedelic hamster dragon" "an illustration of albert einstein wearing a superhero costume"
"a boat in the canals of venice" "a painting of a fox in the style of starry night" "a red cube on top of a blue cube" "a stained glass window of a panda eating bamboo"
"a crayon drawing of a space elevator" "a futuristic city in synthwave style" "a pixel art corgi pizza" "a fog rolling into new york"
Replying to @RiversHaveWings
That paper is excellent. The idea of guiding the most maximally compressed representation is clever. I feel like you really want something that works along different levels of compression. It could combine well with a pixel level model.
Diffusion autoencoders work, so we could learn the latent space of a diffusion autoencoder with an inner diffusion model! :)