Variance Components Beyond Genetics
Variance components analyses focus on estimating the net contribution of an entire group of variables to an outcome, without requiring estimating each variable; this is critical for learning if the haystack of variable contains a needle at all, and yet, this approach is hardly used outside behavioral genetics. That should change.
This requires some form of ‘distance’ equivalent to genetic relatedness for doing the clustering, which typically doesn’t exist—but how much of that is simply that practitioners in all other areas simply don’t think about this at all? And where there is no natural distance, it may be possible to synthesize a proxy one out of a lot of raw data and, using that as a ‘bar code’ or ‘fingerprint’, cluster individuals that way (cf. hash trick, k-NN/
“Phenomic selection: a low-cost and high-throughput alternative to genomic selection”, et al2018
“In-field whole plant maize architecture characterized by Latent Space Phenotyping”, et al2019 (et al2019; et al2021); “MegaBayesianAlphabet: Mega-scale Bayesian Regression methods for genome-wide prediction and association studies with thousands of traits”, et al2022; “Raman2RNA: Live-cell label-free prediction of single-cell RNA expression profiles by Raman microscopy”, Kobayashi-et al2022
“Analysis of variance when both input and output sets are high-dimensional”, de los et al2020
“Using high-throughput phenotypes to enable genomic selection by inferring genotypes”, et al2020
“Interest of phenomic prediction as an alternative to genomic prediction in grapevine”, et al2021
“Exploring the variance in complex traits captured by DNA methylation assays”, et al2020
“Environmental factors dominate over host genetics in shaping human gut microbiota composition”, et al2017 (“We define the term biome-explainability as the variance of a host phenotype explained by the microbiome after accounting for the contribution of human genetics…biome-explainability levels of 16–33% for body mass index (BMI), fasting glucose, high-density lipoprotein (HDL) cholesterol, waist circumference, waist-hip ratio (WHR), and lactose consumption.”); “Autism-related dietary preferences mediate autism-gut microbiome associations”, et al2021
“Do multiple experimenters improve the reproducibility of animal studies?”, von et al2022
Morphometricity:
“Morphometricity as a measure of the neuroanatomical signature of a trait”, et al2016
“The relationship between spatial configuration and functional connectivity of brain regions”, et al2018
“Analyzing Brain Morphology on the Bag-of-Features Manifold”, et al2019
“Resting brain dynamics at different timescales capture distinct aspects of human behavior”, et al2019
“Widespread associations between grey matter structure and the human phenome”, Couvy-et al2019 (Figure 1); “A unified framework for association and prediction from vertex-wise grey-matter structure”, Couvy-et al2020a
“Predicting human inhibitory control from brain structural MRI”, et al2019
“A parsimonious model for mass-univariate vertex-wise analysis”, Couvy-et al2021
“General dimensions of human brain morphometry inferred from genome-wide association data”, et al2021
“Identifying imaging genetic associations via regional morphometricity estimation”, et al2022
Suggestions (cf. “exposome”):
drinking-water chemical spectrums/
obesity (to test chemical contamination theories) microplastics contamination theories: variance components could help quantify the burden from plastic load, partition between fat stores vs free circulating blood levels, kinds of plastic, etc. and establish if there is any category of microplastics effects with a total effect worth worrying about
neural net face embeddings/
human phenome (the perennially-controversial question of “what human traits can be inferred from facial appearance?”) large-scale survey/
inventory batteries/ human phenome (exploit how everything is correlated to try to bound prediction possibilities of eg. personality inventories; a better way forward for psychology than et al2021 which argues for the equivalent of paying for large-scale GWASes before a single twin or SNP heritability study has been done) human smell inventory: smells have been correlated with everything from age to diabetes to Parkinson’s, but suffers from the sheer expense of training powerful smell-predictors (typically dogs or machine learning analytical chemistry models) on a trait by trait basis
air pollution
influence of diet1 on phenotypes (such as productivity or longevity or obesity)
shotgun sequencing of the whole virome/
microbiome embedding of all text documents about a person, similar to “Using Sequences of Life-events to Predict Human Lives”, Savcisens et a l2023
-
One could use Herculano-Houzel’s trick to easily turn ‘diet’ into a single homogenous sample: blenderize it! One could also try to reuse the Rincent trick of infrared photography. If those don’t work, feces may be acceptable individual-level samples, and if that doesn’t work, perhaps sewage samples?
Similar Links
Contrasting the genetic architecture of 30 complex traits from summary association data
Statistical properties of simple random-effects models for genetic heritability
Genome-Wide Estimates of Heritability for Social Demographic Outcomes
Heritability in the genomics era—concepts and misconceptions
‘Reports of My Death Were Greatly Exaggerated’: Behavior Genetics in the Postgenomic Era
On the genetic architecture of intelligence and other quantitative traits
Simulation of model overfit in variance explained with genetic data
Hereditary and Environmental Sources of Trait Variation and Covariation
Widespread associations between grey matter structure and the human phenome
Small Effects: The Indispensable Foundation for a Cumulative Psychological Science