Matt Groh · Apr 6, 2022 · 12:00 PM UTC

Matt Groh

Matt Groh

@mattgroh

6 Apr 2022

Context is intuitive for people but quite tricky for machines. The leading model on the IMAGECODE task (retrieving the correct image based on a description from a set of highly similar images) is 29% accurate, which is way below Amazon Mechanical Turk worker performance at 91%

Benno Krojer @benno_krojer

5 Apr 2022

Can vision & language models retrieve the correct image from a set given its contextual description (e.g. No bridesmaid visible at all)? We show that models struggle with this kind of contextual reasoning arxiv.org/abs/2203.15867 mcgill-nlp.github.io/imageco… #ACL2022

Matt Groh · Apr 12, 2022 · 1:53 AM UTC

Matt Groh

@mattgroh

12 Apr 2022

Another great example of context and compositionality easy for humans and not yet solved by the state-of-the-art machine learning models:

Tristan Thrush @TristanThrush

11 Apr 2022

Replying to @TristanThrush

The task: Given two images and two captions, the goal is to match them correctly—but crucially, both captions contain the same words/morphemes, only in a different order. Identical words between captions means that BOW models cannot perform above chance. 2/5

Matt Groh · Apr 12, 2022 · 11:13 AM UTC

Matt Groh · Apr 12, 2022 · 11:13 AM UTC

Matt Groh

@mattgroh

12 Apr 2022

Following up on compositionality, here are examples (chosen from a set of 6 each) from #dalle2 generating the following prompts: (a) some plants surrounding a lightbulb (b) a lightbulb surrounding some plants (c) plants with a lightbulb inside (d) a lightbulb with plants inside

Apr 12, 2022 · 11:13 AM UTC

the ΔI gang · Apr 12, 2022 · 2:32 PM UTC

the ΔI gang @the_AI_gang

12 Apr 2022

Replying to @mattgroh

Amazing #AIart by #Dalle2, #AI greetings you 🖖🏻🤖🤚🏻 #ArtificialIntelligence #openai #aiartcommunity