I just worked out how to solve multimodal analogies using CLIP, at least where you want the solution as an image. <image> : "bird" :: <output> : ["monkey"/"tree"/"Cthulhu"], first image was the input image:

May 6, 2021 · 8:09 PM UTC

Replying to @RiversHaveWings
Really cool! I wonder what the solution was, just vector differences as in classical NLP word2vec, or more advanced logic?
I just used vector differences. :)
Replying to @RiversHaveWings
Holy guacamole!
Replying to @RiversHaveWings
Wonderful way to create analogies as in the old Queen kind woman men. Is there any chance to have a look at the code, so that I can play around and release my possible improvements?
Replying to @RiversHaveWings
I'm intrigued by the text/letterforms in the last "cthulhu" image. Any idea why that comes up?
Replying to @RiversHaveWings
This is awesome! My mind is blown!
Replying to @RiversHaveWings
This is awesome