Variance Components Beyond Genetics

Gwern

Variance Components Beyond Genetics

Variance components analyses focus on estimating the net contribution of an entire group of variables to an outcome, without requiring estimating each variable; this is critical for learning if the haystack of variable contains a needle at all, and yet, this approach is hardly used outside behavioral genetics. That should change.

2019-03-29 in progress certainty: possible importance: 4 bibliography

Where else besides genetics can we use behavioral genetics’s workhorse of variance components analysis to nail down the net contribution of entire classes of effects rather than the usual (and usually futile) approach of attempting to exactly estimate one or a handful of said effects? If power analysis tells you whether you have enough light to find the needles in the haystack, variance components can tell you whether there are even any needles to look for. (An analogy would be incinerating & grinding up the entire haystack and chemically measuring the ratio of carbon to iron; this tells you how much of the haystack was needles, but not where or how many.) They can also indicate if it would greatly increase statistical power by controlling a particular source of variance, by measurement or construction (see eg. the Lanarkshire milk experiment, where using identical twins would’ve reduce the necessary sample size by 20×).

This requires some form of ‘distance’ equivalent to genetic relatedness for doing the clustering, which typically doesn’t exist—but how much of that is simply that practitioners in all other areas simply don’t think about this at all? And where there is no natural distance, it may be possible to synthesize a proxy one out of a lot of raw data and, using that as a ‘bar code’ or ‘fingerprint’, cluster individuals that way (cf. hash trick, k-NN/nearest-neighbor interpolation, compressed sensing). We have already seen imaginative applications of it in high-dimensional data like brain imaging or leaf spectral imaging, so perhaps there is far more that can be done:

“Phenomic selection: a low-cost and high-throughput alternative to genomic selection”, Rincent et al 2018
“In-field whole plant maize architecture characterized by Latent Space Phenotyping”, Gage et al 2019 (Ubbens et al 2019; Runcie et al 2021); “MegaBayesianAlphabet: Mega-scale Bayesian Regression methods for genome-wide prediction and association studies with thousands of traits”, Qu et al 2022; “Raman2RNA: Live-cell label-free prediction of single-cell RNA expression profiles by Raman microscopy”, Kobayashi-Kirschvink et al 2022
“Analysis of variance when both input and output sets are high-dimensional”, de los Campos et al 2020
“Using high-throughput phenotypes to enable genomic selection by inferring genotypes”, Whalen et al 2020
“Interest of phenomic prediction as an alternative to genomic prediction in grapevine”, Brault et al 2021
“Exploring the variance in complex traits captured by DNA methylation assays”, Battram et al 2020
“Environmental factors dominate over host genetics in shaping human gut microbiota composition”, Rothschild et al 2017 (“We define the term biome-explainability as the variance of a host phenotype explained by the microbiome after accounting for the contribution of human genetics…biome-explainability levels of 16–33% for body mass index (BMI), fasting glucose, high-density lipoprotein (HDL) cholesterol, waist circumference, waist-hip ratio (WHR), and lactose consumption.”); “Autism-related dietary preferences mediate autism-gut microbiome associations”, Yap et al 2021
“Do multiple experimenters improve the reproducibility of animal studies?”, von Kortzfleisch et al 2022
Morphometricity:
- “Morphometricity as a measure of the neuroanatomical signature of a trait”, Sabuncu et al 2016
- “Morphometric Similarity Networks Detect Microscale Cortical Organization and Predict Inter-Individual Cognitive Variation”, Seidlitz et al 2018
- “Intact Connectional Morphometricity Learning Using Multi-view Morphological Brain Networks with Application to Autism Spectrum Disorder”, Bessadok & Rekik 2018
- “The relationship between spatial configuration and functional connectivity of brain regions”, Bijsterbosch et al 2018
- “Analyzing Brain Morphology on the Bag-of-Features Manifold”, Chauvin et al 2019
- “Resting brain dynamics at different timescales capture distinct aspects of human behavior”, Liégeois et al 2019
- “Widespread associations between grey matter structure and the human phenome”, Couvy-Duchesne et al 2019 (Figure 1); “A unified framework for association and prediction from vertex-wise grey-matter structure”, Couvy-Duchesne et al 2020a
- “Predicting human inhibitory control from brain structural MRI”, He et al 2019
- “Global Signal Regression Strengthens Association between Resting-State Functional Connectivity and Behavior”, Li et al 2019
- “Ensemble Learning of Convolutional Neural Network, Support Vector Machine, and Best Linear Unbiased Predictor for Brain Age Prediction: ARAMIS Contribution to the Predictive Analytics Competition 2019 Challenge”, Couvy-Duchesne et al 2020b
- “A parsimonious model for mass-univariate vertex-wise analysis”, Couvy-Duchesne et al 2021
- “General dimensions of human brain morphometry inferred from genome-wide association data”, Fürtjes et al 2021
- “Identifying imaging genetic associations via regional morphometricity estimation”, Bao et al 2022
- “Individual differences in internalizing symptoms in late childhood: A variance decomposition into cortical thickness, genetic and environmental differences”, Tandberg et al 2024
Suggestions (cf. “exposome”):
- drinking-water chemical spectrums/obesity (to test chemical contamination theories)
- microplastics contamination theories: variance components could help quantify the burden from plastic load, partition between fat stores vs free circulating blood levels, kinds of plastic, etc. and establish if there is any category of microplastics effects with a total effect worth worrying about
- neural net face embeddings/human phenome (the perennially-controversial question of “what human traits can be inferred from facial appearance?”)
- large-scale survey/inventory batteries/human phenome (exploit how everything is correlated to try to bound prediction possibilities of eg. personality inventories; a better way forward for psychology than Götz et al 2021 which argues for the equivalent of paying for large-scale GWASes before a single twin or SNP heritability study has been done)
  - human smell inventory: smells have been correlated with everything from age to diabetes to Parkinson’s, but suffers from the sheer expense of training powerful smell-predictors (typically dogs or machine learning analytical chemistry models) on a trait by trait basis
- air pollution
- “gendermetricity”?
- influence of diet¹ on phenotypes (such as productivity or longevity or obesity)
- shotgun sequencing of the whole virome/microbiome
- embedding of all text documents about a person, similar to “Using Sequences of Life-events to Predict Human Lives”, Savcisens et a l2023

One could use Herculano-Houzel’s trick to easily turn ‘diet’ into a single homogenous sample: blenderize it! One could also try to reuse the Rincent trick of infrared photography. If those don’t work, feces may be acceptable individual-level samples, and if that doesn’t work, perhaps sewage samples?↩︎

Error: JavaScript disabled.

Backlinks, similar links, and the bibliography require JS enabled to load.

Bibliography

[Bibliography of links/references used in page]