VQGAN+CLIP Keyword Modifier Comparison

We compared 126 keyword modifiers with the same prompt and initial image. These are the results.

Jump to the results

August 2022 update: Community member tdraw_ai_art has recreated this study using the Stable Diffusion algorithm.

See Comparison

Base image (upsized). 400 iterations of "Scary skeleton astronaut in space".

Method

The Experiment Explained

We started by running the prompt "Scary skeleton astronaut in space" for 400 iterations at thumb resolution (400x400px). That gave us this base image.

Then, we evolved that creation 126 times, each time adding a different keyword modifier, and running for an additional 400 iterations. "Evolving" a creation uses the previous creation's output as the start image for the next, so every experiment started from the base image (i.e. NOT from scratch).

This experiment was inspired by this fantastic album by Reddit user u/kingdomakrillic.

Want to try your own modifiers on this base image? Click "Evolve It" below then add your modifier to the prompt.

Evolve It

Jump to results

Clockwise from top left: "A dog on the beach", "A dog on the beach Thomas Kinkade", "A dog on the beach Unreal Engine", "A dog on the beach detailed painting".

How do VQGAN+CLIP Modifiers Work?

Taken from our VQGAN+CLIP tutorial on Medium.

Modifiers are just keywords that have been found to have a strong influence on how the AI interprets your prompt. In most cases, using one or more modifiers in your prompt will dramatically improve the resulting image. Here’s an example using the text prompt “A dog on the beach”. It’s obvious that the top left image (without any modifiers) is noticeably worse than the others.

So why do modifiers have such a dramatic effect? It’s to do with the data that the CLIP network was trained on — millions of image and caption pairs from the internet. CLIP has seen a huge number of images on the internet, and the ones that include the words “Thomas Kinkade” in the caption tend to be nicely textured paintings like those shown in the centre-left image. Likewise the images that were paired with a caption containing the words “Unreal Engine” tend to look like scenes from a video game (because Unreal Engine is a video game rendering engine).

Thus, when you include modifiers like “Thomas Kinkade” or “Unreal Engine”, CLIP knows that the image should look a certain way. Note that in the examples above, it’s not so much the shapes that are better with modifiers, it’s the finer textures that make it look better.

Clockwise from top left: "Unreal Engine", "VRay", "SketchFab", "CryEngine".

Observations

A few key takeaways

3D rendering engines as modifiers

The modifiers that are 3D rendering engines (pictured: Unreal Engine, CryEngine, VRay, SketchUp) really shine here. Interestingly, the rendering engines targeted at games (Unreal Engine, CryEngine) both ended up with a spaceship interior in the the background.

Some didn't affect the base image much

Some modifiers like "futuristic", "mystical", "dream" and a few others didn't end up deviating far from the base image. Perhaps this means that CLIP doesn't have a strong concept of what these keywords should look like? Or maybe these modifiers are a bit too broad and therefore hard for CLIP to steer towards any particular look? It would be interesting to do more experiments with these modifiers to get a better idea of what's going on.

VQGAN+CLIP on NightCafe Creator

>2x faster than Google Colab • Run multiple jobs in parallel • Works on any device • Create, evolve and add modifiers in a few clicks

Start Creating

Learn more: VQGAN+CLIP app

Results

Scroll through the results and vote for your favourites by liking them.

VQGAN+CLIP Keyword Modifier Comparison

The Experiment Explained

How do VQGAN+CLIP Modifiers Work?

A few key takeaways

3D rendering engines as modifiers

Some didn't affect the base image much

VQGAN+CLIP on NightCafe Creator

Results

Join us on Discord

Get Social With Us

Join our communities

Follow NightCafe

🤖 NightCafe

🚀 Features

🔒 Secure Payments