[Almost certainly wrong due to data leakage: an AUROC of ~1 (!) for classifying autism vs normal, while an AUROC below chance for classifying severity of autism symptoms within the autism cohort (buried in the supplement), can only mean data leakage, and the description of the photographic procedures makes clear how the CNNs did so: the autistic patients were photographed in a room set up & designed & equipped specially for autism research, with different procedures, over much longer periods of time due to difficulty in getting them to calm down/cooperate, with different cameras.
So all the CNN learns here is some aspect of the photographs from that room—perhaps brightness or room color, the idiosyncratic optical aberrations of the camera used in it exclusively for autistic patients, etc. (The sample pair in Figure 2 suggests that the autism vs normal eye photographs look very different, and the autism ones in particular much brighter lit, which is why the model is totally invariant to erasure up until the very last bit of the photograph is erased.)
The authors do not seem to understand this, and try to explain the perfect classification vs the failure of the symptom predictor as greater measurement error in symptom count than diagnosis—although symptom count is not that error-prone and shouldn’t have completely eliminated all detection ability when supposedly one could achieve near-perfect prediction of autism status! What even is the biological argument here? That there is some marker in the retina (why?) which all autistic children have no matter what subtype of autism or heterogeneity, by age 6, which hardly any normal children have, but also that this marker is binary and effectively does not reflect anything else about their autism…? Absurd, and clearly fully explained by data leakage of case status from the photograph procedures. This should have been trivially obvious to the peer-reviewers & JAMA as this is the most blatant way any medical neural net classification system fails…]
Question: Can deep learning models screen individuals for autism spectrum disorder (ASD) and symptom severity using retinal photographs?
Findings: In this diagnostic study of 1,890 eyes of 958 participants, deep learning models had a mean area under the receiver operating characteristic curve of 1.00 for ASD screening and 0.74 for symptom severity. The optic disc area was also important in screening for ASD.
Meaning: These findings support the potential of artificial intelligence as an objective tool in screening for ASD and possibly for symptom severity using retinal photographs.
Importance: Screening for autism spectrum disorder (ASD) is constrained by limited resources, particularly trained professionals to conduct evaluations. Individuals with ASD have structural retinal changes that potentially reflect brain alterations, including visual pathway abnormalities through embryonic and anatomic connections. Whether deep learning algorithms can aid in objective screening for ASD and symptom severity using retinal photographs is unknown.
Objective: To develop deep ensemble models to differentiate between retinal photographs of individuals with ASD vs typical development (TD) and between individuals with severe ASD vs mild to moderate ASD.
Design, Setting, & Participants: This diagnostic study was conducted at a single tertiary-care hospital (Severance Hospital, Yonsei University College of Medicine) in Seoul, Republic of Korea. Retinal photographs of individuals with ASD were prospectively collected between April & October 2022, and those of age & sex-matched individuals with TD were retrospectively collected between December 2007 and February 2023.
Deep ensembles of 5 models were built with 10× cross-validation using the pretrained ResNeXt-50 (32×4d) network. Score-weighted visual explanations for convolutional neural networks, with a progressive erasing technique, were used for model visualization and quantitative validation. Data analysis was performed between December 2022 and October 2023.
Exposures: Autism Diagnostic Observation Schedule–2nd Edition calibrated severity scores (cutoff of 8) and Social Responsiveness Scale–2nd Edition T scores (cutoff of 76) were used to assess symptom severity.
Main Outcomes & Measures: The main outcomes were participant-level area under the receiver operating characteristic curve (AUROC), sensitivity, and specificity. The 95% CI was estimated through the bootstrapping method with 1,000 resamples.
Results: This study included 1890 eyes of 958 participants. The ASD and TD groups each included 479 participants (945 eyes), had a mean (SD) age of 7.8 (3.2) years, and comprised mostly boys (392 [81.8%]). For ASD screening, the models had a mean AUROC, sensitivity, and specificity of 1.00 (95% CI, 1.00–1.00) on the test set. These models retained a mean AUROC of 1.00 using only 10% of the image containing the optic disc. For symptom severity screening, the models had a mean AUROC of 0.74 (95% CI, 0.67–0.80), sensitivity of 0.58 (95% CI, 0.49–0.66), and specificity of 0.74 (95% CI, 0.67–0.82) on the test set.
Conclusion & Relevance: These findings suggest that retinal photographs may be a viable objective screening tool for ASD and possibly for symptom severity. Retinal photograph use may speed the ASD screening process, which may help improve accessibility to specialized child psychiatry assessments currently strained by limited resources.
…2. Retinal imaging environment: When obtaining retinal photographs of patients with ASD, caregivers accompanied them to ensure comfort and stability. The photography sessions for patients with ASD took place in a space dedicated to their needs, distinct from a general ophthalmology examination room. This space was designed to be warm and welcoming, thus creating a familiar environment for patients. Retinal photographs of typically developing (TD) individuals were obtained in a general ophthalmology examination room.
Each eye required an average of 10–30 s for photography, although some cases involved longer periods to help the patient calm down, sometimes exceeding 5–10 min [!]. All images were captured in a dark room to optimize their quality. Retinal photographs of both patients with ASD and TD were obtained using non-mydriatic fundus cameras, including EIDON (iCare), Nonmyd 7 (Kowa), TRC-NW8 (Topcon), and Visucam NM/FA (Carl Zeiss Meditec).
Figure 2: Quantitative Validation of the Heat Map With the Progressive Erasing Technique for Autism Spectrum Disorder (ASD) Screening.
(A) Area under the receiver operating characteristic curve (AUROC) with shaded 95% CI obtained from masked images.
(B) Progressive erasing for ASD and typical development (TD).
‘ADOS-2’ indicates Autism Diagnostic Observation Schedule—2nd Edition; DSM-5, Diagnostic and Statistical Manual of Mental Disorders, 5th Edition.
…To screen for symptom severity measured with SRS-2 scores, 556 retinal photographs were used (277 for scores ≥76 and 279 for scores <76). The models failed to screen for SRS-2–based symptom severity, with a mean AUROC of 0.44 (95% CI, 0.38–0.50), sensitivity of 0.52 (95% CI, 0.46–0.59), specificity of 0.44 (95% CI, 0.38–0.51), and accuracy of 0.48 (95% CI, 0.44–0.53) for the test set (Table 2). The classification failed in all split ratios (eTable 1 in Supplement 1). The receiver operating characteristic curves for both tasks are presented in Supplementary Figure 2 in Supplement 1.
Supplementary Figure 2: Receiver Operating Characteristic Curves of Models for ASD Symptom Severity Screening (ADOS-2 and SRS-2).
Note: Shaded areas indicate the 95% CIs. Abbreviations: ADOS-2=Autism Diagnostic Observation Schedule-2, ASD=autism spectrum disorder, AUROC=area under the receiver operating characteristics, CI=confidence interval, SRS-2=Social Responsiveness Scale-2.