“VQGAN-CLIP: Open Domain Image Generation and Editing With Natural Language Guidance”, Katherine Crowson, Stella Biderman, Daniel Kornis, Dashiell Stander, Eric Hallahan, Louis Castricato, Edward Raff2022-04-18 (, , )⁠:

Generating and editing images from open domain text prompts is a challenging task that heretofore has required expensive and specially trained models. We demonstrate a novel methodology for both tasks which is capable of producing images of high visual quality from text prompts of high semantic complexity without any training by using a multimodal encoder to guide image generations.

We demonstrate on a variety of tasks how using CLIP to guide VQGAN produces higher visual quality outputs than prior, less flexible approaches like DALL¡E, GLIDE, and Open-Edit, despite not being trained for the tasks presented.

Our code is available in a public repository.