However, two studies have recently reported widespread irreproducibility in the characterization of
metal-organic frameworks [5] and hydrogen storage materials [6].
Amid a broader “reproducibility crisis” [7] and an exponentially-growing scientific literature [8],
there has been an increasing interest in developing approaches for detection of markers of irrepro-
ducibility, errors and impropriety at-scale. For instance, Labb
́
e and colleagues developed
Seek &
Blastn
to rapidly screen publications for misidentified nucleotide reagents [9–11]. Bik and colleagues
manually screened over 20,000 published articles in the life sciences for image duplication [12],
finding that 3.8% of published articles contained inappropriately duplicated figure elements. Semi-
automated approaches that accelerate the discovery of image integrity issues have also been developed
[13–15]. Several publishers now employ automated methods for detecting image duplication in un-
published manuscripts [16].
In an effort to extend the development of automated or semi-automated tools in the context of
the physico-chemical sciences and engineering, we study here the reporting of scanning electron
microscope (SEM) instrumentation in scientific papers in material science and engineering, broadly
defined. SEMs are a critical tool for the characterization of samples [17, 18]. Tens of thousands
of articles using SEM are published annually (
Fig. 1a
and
Fig. S1
). Studies reporting on original
research articles will identify the manufacturer, model and operating parameters of the SEM used —
e.g. “samples were observed with a Philips XL30 field emission scanning electron microscope at an
accelerating voltage of 10 kV.” Many times, the published images obtained with the SEM will include
the instrument’s auto-generated “banner” that discloses experimental metadata (
Fig. 1b
). Depending
on the SEM manufacturer, this banner can include information on the instrument’s manufacturer and
model (
Fig. 1c
), various operating parameters of the instrument, and the facility that operates the
instrument.
Recently, several users on the post-publication peer review site PubPeer [19] have documented
dozens of instances of articles where the SEM instrument identified by the authors within the manuscript’s
text does not match the instrument’s metadata visible in the published images [20]. Such a gross lack
of attention to detail suggests that something is likely amiss with the study. It is difficult to imagine
such a mistake being made in good-faith preparation of an article. However, if articles are hastily
mass-produced, or the article text was plagiarized from one source and the article images from an-
other, these would be among the first details to be missed. Indeed, PubPeer commenters often note
other inconsistencies in these articles that directly call into question the reliability of the findings or
the provenance of the article.
To systematically investigate this matter, we developed a semi-automated pipeline for detection
of misreporting of SEM instrumentation in published research articles. We deployed our method on
more than a million articles from across 50 journals published by 4 different publishers. We found
that for 21.2% of the articles for which data was extractable using our approach, the image metadata
did not match the SEM make or model listed in the text of the manuscript. For another 24.7%, at least
some of the SEM instruments used in the study are not disclosed.
Data and Methods
In order to produce a precise estimate of the rate at which misidentification of SEM instrumentation
occurs in materials science and engineering literature overall, one would need to obtain a represen-
tative sample of journals appropriately stratified by publisher, impact, reputation, subfield, author
characteristics, and so on. Unfortunately, several factors precluded achieving this goal. First, several
large publishers of high-profile materials science journals have established ‘data mining’ licensing
2