Fun weekend experiment: turns out you can use any image as its own mask when doing conditional inpainting with diffusion models: you simply threshold the pixel values! This allows for some really interesting conditioning tricks that seemlessly blend real and generated imgs:

Jun 25, 2022 · 9:36 PM UTC

Eg in these samples, the top image is the conditioning image which is simply thresholded to create a binary, black and white mask. The samples below are inpainting results using the same conditioning image + mask but different text prompts:
Ergo, this trick allows combining the best elements from real images (composition, geometric shapes, faces...) with the advantages of generative models (eg text to pixels) in a seamless way. I'm loving the extra layer of explicit control this gives me over the model outputs!
For reference, I'm using a standard inpainting method implemented from the paper linked here, combined with a custom diffusion pipeline (similar to DD) and models trained by @RiversHaveWings. All samples rendered on my local RTX 3090 machine!
Having some late night fun with diffusion inpainting (implemented the algorithm from this paper: arxiv.org/abs/2201.09865) Here's four different prompts using the same mask + conditioning image:
Replying to @xsteenbrugge
Is this with priors? I had this working well with classifier guidance, but with priors it always had less interesting results. This looks really good where it’s blending the artistically like the classifier would.
No classifier guidance here, just regular pretrained diffusion prior + CLIP guidance!
Replying to @xsteenbrugge
I've wondered about this, starting from a roughly segmented mean-shift clustered version of the image (e.g. luthuli.cs.uiuc.edu/~daf/cou…) rather than pixels directly, maybe even iterating over mean-shift "buckets". Very cool
This tweet is unavailable
I tried using non-binary masking (with fading regions) but couldn't get this to work properly yet. I think it should work, but in my implementation it messed up the diffusion img statistics and I couldn't figure out why..