Xander Steenbrugge · Jun 25, 2022 · 9:36 PM UTC

Xander Steenbrugge · Jun 25, 2022 · 9:36 PM UTC

Xander Steenbrugge

@xsteenbrugge

25 Jun 2022

Fun weekend experiment: turns out you can use any image as its own mask when doing conditional inpainting with diffusion models: you simply threshold the pixel values! This allows for some really interesting conditioning tricks that seemlessly blend real and generated imgs:

Jun 25, 2022 · 9:36 PM UTC

Xander Steenbrugge · Jun 25, 2022 · 9:36 PM UTC

Xander Steenbrugge

@xsteenbrugge

25 Jun 2022

Eg in these samples, the top image is the conditioning image which is simply thresholded to create a binary, black and white mask. The samples below are inpainting results using the same conditioning image + mask but different text prompts:

Xander Steenbrugge · Jun 25, 2022 · 9:36 PM UTC

Xander Steenbrugge

@xsteenbrugge

25 Jun 2022

Ergo, this trick allows combining the best elements from real images (composition, geometric shapes, faces...) with the advantages of generative models (eg text to pixels) in a seamless way. I'm loving the extra layer of explicit control this gives me over the model outputs!

Xander Steenbrugge · Jun 25, 2022 · 9:36 PM UTC

Xander Steenbrugge

@xsteenbrugge

25 Jun 2022

For reference, I'm using a standard inpainting method implemented from the paper linked here, combined with a custom diffusion pipeline (similar to DD) and models trained by @RiversHaveWings. All samples rendered on my local RTX 3090 machine!

Xander Steenbrugge

@xsteenbrugge

17 Jun 2022

Having some late night fun with diffusion inpainting (implemented the algorithm from this paper: arxiv.org/abs/2201.09865) Here's four different prompts using the same mask + conditioning image:

Nicholas Bardy · Jun 25, 2022 · 10:31 PM UTC

Nicholas Bardy @NicholasBardy

25 Jun 2022

Replying to @xsteenbrugge

Is this with priors? I had this working well with classifier guidance, but with priors it always had less interesting results. This looks really good where it’s blending the artistically like the classifier would.

Xander Steenbrugge · Jun 25, 2022 · 10:44 PM UTC

Xander Steenbrugge

@xsteenbrugge

25 Jun 2022

No classifier guidance here, just regular pretrained diffusion prior + CLIP guidance!

more replies

Kyle Kastner · Jun 26, 2022 · 7:19 PM UTC

Kyle Kastner @kastnerkyle

26 Jun 2022

Replying to @xsteenbrugge

I've wondered about this, starting from a roughly segmented mean-shift clustered version of the image (e.g. luthuli.cs.uiuc.edu/~daf/cou…) rather than pixels directly, maybe even iterating over mean-shift "buckets". Very cool

This tweet is unavailable

Xander Steenbrugge · Jun 26, 2022 · 9:35 AM UTC

Xander Steenbrugge

@xsteenbrugge

26 Jun 2022

I tried using non-binary masking (with fading regions) but couldn't get this to work properly yet. I think it should work, but in my implementation it messed up the diffusion img statistics and I couldn't figure out why..