Super conditioning is very powerful 😎 See the effects of the conditioning scale 👉 wandb.ai/dalle-mini/dalle-mi… For each prompt, we generate images using a different conditioning scale (1 row per scale).

May 9, 2022 · 9:29 PM UTC

We could notice a bigger difference in early training. Now it's more obvious on difficult prompts. A higher conditioning scale is supposed to match better the prompt, but with a lower diversity.
In some cases, we can be lucky and have a great sample with no conditioning scale, but the image will more often be missing details from the prompt.
During training, you drop the prompt 10-20% of the time and at generation, you do: logits = logits_uncond + cond_scale * (logits_cond - logits_uncond) So for cond_scale > 1, you shift logits distribution closer to the condition and away from unconditional distribution
You can find more details about super conditioning in this great thread from @RiversHaveWings
You can apply a similar trick to classifier-free guidance to autoregressive transformers to sample from a synthetic "super-conditioned" distribution. I trained a CIFAR-10 class-conditional ImageGPT to try this, and I got the following grids with cond_scale 1 (default) and then 3:
Replying to @borisdayma
Conditioning like they do doesn't really make sense to me, why not do only class-conditional, why do a class-unconditional then add only X amount of class-conditional. It kind of imply that the class-conditional model is not working well enough as is.
The reason is that on image/captions dataset, we have a lot of bad data with not so great matching captions. When you use a conditioning scale > 1, you try to match the prompt even more than your average data by pushing the logits distribution even further towards that prompt
Replying to @borisdayma
What are the scale values in the rows here?
Oh I put them in the first column
Replying to @borisdayma
Is this using DALL-E or DALL-E 2?
Replying to @borisdayma
The table in the link is similar to what I'm picturing @SpeaksNanda