Boris Dayma 🖍️ · May 9, 2022 · 9:29 PM UTC

Boris Dayma 🖍️ · May 9, 2022 · 9:29 PM UTC

Boris Dayma 🖍️

@borisdayma

9 May 2022

Super conditioning is very powerful 😎 See the effects of the conditioning scale 👉 wandb.ai/dalle-mini/dalle-mi… For each prompt, we generate images using a different conditioning scale (1 row per scale).

May 9, 2022 · 9:29 PM UTC

102

Boris Dayma 🖍️ · May 9, 2022 · 9:29 PM UTC

Boris Dayma 🖍️

@borisdayma

9 May 2022

We could notice a bigger difference in early training. Now it's more obvious on difficult prompts. A higher conditioning scale is supposed to match better the prompt, but with a lower diversity.

Boris Dayma 🖍️ · May 9, 2022 · 9:29 PM UTC

Boris Dayma 🖍️

@borisdayma

9 May 2022

In some cases, we can be lucky and have a great sample with no conditioning scale, but the image will more often be missing details from the prompt.

Boris Dayma 🖍️ · May 9, 2022 · 9:29 PM UTC

Boris Dayma 🖍️

@borisdayma

9 May 2022

During training, you drop the prompt 10-20% of the time and at generation, you do: logits = logits_uncond + cond_scale * (logits_cond - logits_uncond) So for cond_scale > 1, you shift logits distribution closer to the condition and away from unconditional distribution

Boris Dayma 🖍️ · May 9, 2022 · 9:29 PM UTC

Boris Dayma 🖍️

@borisdayma

9 May 2022

You can find more details about super conditioning in this great thread from @RiversHaveWings

Rivers Have Wings @RiversHaveWings

3 Jan 2022

You can apply a similar trick to classifier-free guidance to autoregressive transformers to sample from a synthetic "super-conditioned" distribution. I trained a CIFAR-10 class-conditional ImageGPT to try this, and I got the following grids with cond_scale 1 (default) and then 3:

Alexia Jolicoeur-Martineau · May 9, 2022 · 10:34 PM UTC

Alexia Jolicoeur-Martineau @jm_alexia

9 May 2022

Replying to @borisdayma

Conditioning like they do doesn't really make sense to me, why not do only class-conditional, why do a class-unconditional then add only X amount of class-conditional. It kind of imply that the class-conditional model is not working well enough as is.

Boris Dayma 🖍️ · May 9, 2022 · 10:43 PM UTC

Boris Dayma 🖍️

@borisdayma

9 May 2022

The reason is that on image/captions dataset, we have a lot of bad data with not so great matching captions. When you use a conditioning scale > 1, you try to match the prompt even more than your average data by pushing the logits distribution even further towards that prompt

more replies

Nicholas Bardy · May 10, 2022 · 12:49 AM UTC

Nicholas Bardy @NicholasBardy

10 May 2022

Replying to @borisdayma

What are the scale values in the rows here?

Boris Dayma 🖍️ · May 10, 2022 · 12:50 AM UTC

Boris Dayma 🖍️

@borisdayma

10 May 2022

Oh I put them in the first column

more replies

Pavel Kasík · May 10, 2022 · 1:48 PM UTC

Pavel Kasík

@kasikp

10 May 2022

Replying to @borisdayma

Is this using DALL-E or DALL-E 2?

Israel Gonzalez-Brooks · May 9, 2022 · 9:33 PM UTC

Israel Gonzalez-Brooks @izzyz

9 May 2022

Replying to @borisdayma

The table in the link is similar to what I'm picturing @SpeaksNanda