“How Low Can You Go? Detecting Style in Extremely Low Resolution Images”, Rachel A. Searston, Matthew B. Thompson, John R. Vokey, Luke A. French, Jason M. Tangen2019-04-04 (, , ; similar)⁠:

Accurate recognition and discrimination of complex visual stimuli is critical to human decision making in medicine, forensic science, aviation, security, and defense. This study highlights the sufficiency of redundant low-spatial and low-dimensional information for visual recognition and visual discrimination of 3 large-scale natural image sets.


Humans can see through the complexity of scenes, faces, and objects by quickly extracting their redundant low-spatial and low-dimensional global properties, or their style. It remains unclear, however, whether semantic coding is necessary, or whether visual stylistic information is sufficient, for people to recognize and discriminate complex images and categories.

In 2 experiments, we systematically reduce the resolution of hundreds of unique paintings, birds, and faces, and test people’s ability to discriminate and recognize them.

We show that the stylistic information retained at extremely low image resolutions is sufficient for visual recognition of images and visual discrimination of categories. Averaging over the 3 domains, people were able to reliably recognize images reduced down to a single pixel, with large differences from chance discriminability across 8 different image resolutions. People were also able to discriminate categories substantially above chance with an image resolution as low as 2×2 pixels.

We situate our findings in the context of contemporary computational accounts of visual recognition and contend that explicit encoding of the local features in the image, or knowledge of the semantic category, is not necessary for recognizing and distinguishing complex visual stimuli.

[Keywords: visual recognition, visual discrimination, ensemble, gist, perceptual expertise]

Figure 2: Panels A, B, and C depict participants’ mean discriminability (A), response bias (b), and rate correct scores (in seconds) recognition memory task as a function of image resolution (x-axes), along with their polynomial trend over pixels at the top of the 3 panels. All plots represent the 50 participants’ responses, collapsing over the 3 domains: paintings, birds, and faces. Panel D shows the receiver operating characteristic curves for the 8 image resolutions, overlaid with the “best-fitting” curve assuming binomial distributions (the dotted line indicates chance performance). Finally, the raincloud plots in Panel E depict a half violin plot of participants’ mean proportion correct scores across the 8 image resolutions overlaid with jittered data points from each individual participant, the mean proportion correct per resolution (the black dot), and standard error of the mean per resolution.