ā€œDo Androids Laugh at Electric Sheep? Humor ā€œUnderstandingā€ Benchmarks from The New Yorker Caption Contestā€, Jack Hessel, Ana Marasović, Jena D. Hwang, Lillian Lee, Jeff Da, Rowan Zellers, Robert Mankoff, Yejin Choi2022-09-13 (, )⁠:

We challenge AI models to ā€œdemonstrate understandingā€ of the sophisticated multimodal humor of The New Yorker [cartoon] Caption Contest.

Concretely, we develop 3 carefully circumscribed tasks for which it suffices (but is not necessary) to grasp potentially complex and unexpected relationships between image and caption, and similarly complex and unexpected allusions to the wide varieties of human experience; these are the hallmarks of a New Yorker-caliber cartoon.

We investigate vision-and-language models that take as input the cartoon pixels and caption directly, as well as language-only models for which we circumvent image-processing by providing textual descriptions of the image. Even with the rich multifaceted annotations we provide for the cartoon images, we identify performance gaps between high-quality machine learning models (eg. a fine-tuned, 175b parameter language model [GPT-3]) and humans.

We publicly release our corpora including annotations describing the image’s locations/entities, what’s unusual about the scene, and an explanation of the joke.

[Yejin Choi interview:

Q. What about a lighter example, like A.I. and humor? Comedy is so much about the unexpected, and if A.I. mostly learns by analyzing previous examples, does that mean humor is going to be especially hard for it to understand?

A. Some humor is very repetitive, and A.I. understands it. But, like, New Yorker cartoon captions? We have a new paper about that. Basically, even the fanciest A.I. today cannot really decipher what’s going on in New Yorker captions.

Q. To be fair, neither can a lot of people.

A. [Laughs.] Yeah, that’s true. We found, by the way, that we researchers sometimes don’t understand these jokes in New Yorker captions. It’s hard. But we’ll keep researching.]