âVQGAN-CLIP: Open Domain Image Generation and Editing With Natural Language Guidanceâ, 2022-04-18 ()â :
Generating and editing images from open domain text prompts is a challenging task that heretofore has required expensive and specially trained models. We demonstrate a novel methodology for both tasks which is capable of producing images of high visual quality from text prompts of high semantic complexity without any training by using a multimodal encoder to guide image generations.
We demonstrate on a variety of tasks how using CLIP to guide VQGAN produces higher visual quality outputs than prior, less flexible approaches like DALL¡E, GLIDE, and Open-Edit, despite not being trained for the tasks presented.
Our code is available in a public repository.