Fun weekend experiment:
turns out you can use any image as its own mask when doing conditional inpainting with diffusion models: you simply threshold the pixel values!
This allows for some really interesting conditioning tricks that seemlessly blend real and generated imgs:
Jun 25, 2022 · 9:36 PM UTC
Eg in these samples, the top image is the conditioning image which is simply thresholded to create a binary, black and white mask.
The samples below are inpainting results using the same conditioning image + mask but different text prompts:
Ergo, this trick allows combining the best elements from real images (composition, geometric shapes, faces...) with the advantages of generative models (eg text to pixels) in a seamless way.
I'm loving the extra layer of explicit control this gives me over the model outputs!
For reference, I'm using a standard inpainting method implemented from the paper linked here, combined with a custom diffusion pipeline (similar to DD) and models trained by @RiversHaveWings. All samples rendered on my local RTX 3090 machine!
Having some late night fun with diffusion inpainting (implemented the algorithm from this paper: arxiv.org/abs/2201.09865)
Here's four different prompts using the same mask + conditioning image: