Variance components analyses focus on estimating the net contribution of an entire group of variables to an outcome, without requiring estimating each variable; this is critical for learning if the haystack of variable contains a needle at all, and yet, this approach is hardly used outside behavioral genetics. That should change.
Where else besides genetics can we use behavioral genetics’s workhorse of variance components analysis to nail down the net contribution of entire classes of effects rather than the usual (and usually futile) approach of attempting to exactly estimate one or a handful of said effects? If power analysis tells you whether you have enough light to find the needles in the haystack, variance components can tell you whether there are even any needles to look for.
This requires some form of ‘distance’ equivalent to genetic relatedness for doing the clustering, which typically doesn’t exist—but how much of that is simply that practitioners in all other areas simply don’t think about this at all? And where there is no natural distance, it may be possible to synthesize a proxy one out of a lot of raw data and, using that as a ‘bar code’ or ‘fingerprint’, cluster individuals that way (cf. hash trick, k-NN/nearest-neighbor interpolation, compressed sensing). We have already seen imaginative applications of it in high-dimensional data like brain imaging or leaf spectral imaging, so perhaps there is far more that can be done:
-
“Phenomic selection: a low-cost and high-throughput alternative to genomic selection”, et al 2018
-
“In-field whole plant maize architecture characterized by Latent Space Phenotyping”, et al 2019 ( et al 2019 ; et al 2021 ); “MegaBayesianAlphabet: Mega-scale Bayesian Regression methods for genome-wide prediction and association studies with thousands of traits”, et al 2022; “Raman2RNA: Live-cell label-free prediction of single-cell RNA expression profiles by Raman microscopy”, Kobayashi-et al 2022
-
“Analysis of variance when both input and output sets are high-dimensional”, de los et al 2020
-
“Using high-throughput phenotypes to enable genomic selection by inferring genotypes”, et al 2020
-
“Interest of phenomic prediction as an alternative to genomic prediction in grapevine”, et al 2021
-
“Exploring the variance in complex traits captured by DNA methylation assays”, et al 2020
-
“Environmental factors dominate over host genetics in shaping human gut microbiota composition”, et al 2017 (“We define the term biome-explainability as the variance of a host phenotype explained by the microbiome after accounting for the contribution of human genetics…biome-explainability levels of 16–33% for body mass index (BMI), fasting glucose, high-density lipoprotein (HDL) cholesterol, waist circumference, waist-hip ratio (WHR), and lactose consumption.”); “Autism-related dietary preferences mediate autism-gut microbiome associations”, et al 2021
-
“Do multiple experimenters improve the reproducibility of animal studies?”, von et al 2022
-
Morphometricity:
-
“Morphometricity as a measure of the neuroanatomical signature of a trait”, et al 2016
-
“The relationship between spatial configuration and functional connectivity of brain regions”, et al 2018
-
“Analyzing Brain Morphology on the Bag-of-Features Manifold”, et al 2019
-
“Resting brain dynamics at different timescales capture distinct aspects of human behavior”, et al 2019
-
“Widespread associations between grey matter structure and the human phenome”, Couvy-et al 2019 ( Figure 1); “A unified framework for association and prediction from vertex-wise grey-matter structure”, Couvy-et al 2020a
-
“Predicting human inhibitory control from brain structural MRI”, et al 2019
-
“A parsimonious model for mass-univariate vertex-wise analysis”, Couvy-et al 2021
-
“General dimensions of human brain morphometry inferred from genome-wide association data”, et al 2021
-
“Identifying imaging genetic associations via regional morphometricity estimation”, et al 2022
-
-
Suggestions (cf. “exposome”):
-
drinking-water chemical spectrums/obesity (to test chemical contamination theories)
-
microplastics contamination theories: variance components could help quantify the burden from plastic load, partition between fat stores vs free circulating blood levels, kinds of plastic, etc. and establish if there is any category of microplastics effects with a total effect worth worrying about
-
neural net face embeddings/human phenome (the perennially-controversial question of “what human traits can be inferred from facial appearance?”)
-
large-scale survey/inventory batteries/human phenome (exploit how everything is correlated to try to bound prediction possibilities of eg. personality inventories; a better way forward for psychology than et al 2021 which argues for the equivalent of paying for large-scale GWASes before a single twin or SNP heritability study has been done)
-
human smell inventory: smells have been correlated with everything from age to diabetes to Parkinson’s, but suffers from the sheer expense of training powerful smell-predictors (typically dogs or machine learning analytical chemistry models) on a trait by trait basis
-
-
air pollution
-
influence of diet1 on phenotypes (such as productivity or longevity or obesity)
-
null{style=“display:none”;}
-
One could use Herculano-Houzel’s trick to easily turn ‘diet’ into a single homogenous sample: blenderize it! One could also try to reuse the Rincent trick of infrared photography. If those don’t work, feces may be acceptable individual-level samples, and if that doesn’t work, perhaps sewage samples?↩︎