Noa Nabeshima · Mar 7, 2021 · 8:25 PM UTC

Noa Nabeshima

7 Mar 2021

The new CLIP adversarial examples are partially from the use-mention distinction. CLIP was trained to predict which caption from a list matches an image. It makes sense that a picture of an apple with a large "iPod" label would be captioned with "iPod", not "Granny Smith"!

Noa Nabeshima · Mar 7, 2021 · 8:38 PM UTC

Noa Nabeshima · Mar 7, 2021 · 8:38 PM UTC

Noa Nabeshima

@NoaNabeshima

7 Mar 2021

This can be somewhat fixed with a list of labels that are more explicit about this, at least for a small set of pictures I've tried. After some experimentation, I found this prompt that seems to work with CLIP ViT-B-32:

Mar 7, 2021 · 8:38 PM UTC

Noa Nabeshima · Mar 7, 2021 · 8:42 PM UTC

Noa Nabeshima

@NoaNabeshima

7 Mar 2021

Credits to @ykilcher for inspiration and @gwern for mentioning 'use-mention distinction' in the EleutherAI discord

Yannic Kilcher 🇸🇨 @ykilcher

6 Mar 2021

🥳New Video (very short)🥳Turns out there is a SUPER EASY fix for countering textual adversarial attacks against @OpenAI's CLIP 😄 piped.video/Rk3MBx20z24

Noa Nabeshima · Mar 7, 2021 · 8:51 PM UTC

Noa Nabeshima

@NoaNabeshima

7 Mar 2021

Also I wonder if this prompt is overfitting to "This is painting, text, symbol" Can you think of a use-mention example that isn't one of those?

Noa Nabeshima · Mar 7, 2021 · 9:09 PM UTC

Noa Nabeshima

@NoaNabeshima

7 Mar 2021

Embarrassingly, this actually doesn't work for every adversarial example in the CLIP blogpost. My guess is the general technique will work for larger CLIPs and better prompts, though.