Halftime shoutout: Props to @l4rz@AydaoAI@pbaylies@Norod78 for ideas and starting notebooks. Go check them out if you find this stuff interesting (signal boost activated).
Here's the crazy bit, these were optimized using *only* the text, the photo for is for reference.
Yup, thanks @gwern et al! Here's what I found, surprisingly *don't* use the reference image as a CLIP image, only use the text (it confuses the model having both). I generated about 30 images each and sorted by the best match (this time using the ref CLIP sim to sort)
Happy to give you the code, but has some current drawbacks 1] only works on *text* it knows (could automate on images to, exploring that now!), 2] needs some cherry picking, only about 1/5 are really good