“Clustered Environments and Randomized Genes: A Fundamental Distinction between Conventional and Genetic Epidemiology”, George Davey Smith, Debbie A. Lawlor, Roger Harbord, Nic Timpson, Ian Day, Shah Ebrahim2007-10-30 (, ; backlinks; similar)⁠:

Background: In conventional epidemiology confounding of the exposure of interest with lifestyle or socioeconomic factors, and reverse causation whereby disease status influences exposure rather than vice versa, may invalidate causal interpretations of observed associations. Conversely, genetic variants should not be related to the confounding factors that distort associations in conventional observational epidemiological studies. Furthermore, disease onset will not influence genotype. Therefore, it has been suggested that genetic variants that are known to be associated with a modifiable (nongenetic) risk factor can be used to help determine the causal effect of this modifiable risk factor on disease outcomes. This approach, Mendelian Randomization, is increasingly being applied within epidemiological studies. However, there is debate about the underlying premise that associations between genotypes and disease outcomes are not confounded by other risk factors. We examined the extent to which genetic variants, on the one hand, and nongenetic environmental exposures or phenotypic characteristics on the other, tend to be associated with each other, to assess the degree of confounding that would exist in conventional epidemiological studies compared with Mendelian Randomization studies.

Methods & Findings: We estimated pairwise correlations between nongenetic baseline variables and genetic variables in a cross-sectional study comparing the number of correlations that were statistically-significant at the 5%, 1%, and 0.01% level (α = 0.05, 0.01, and 0.0001, respectively) with the number expected by chance if all variables were in fact uncorrelated, using a two-sided binomial exact test. We demonstrate that behavioral, socioeconomic, and physiological factors are strongly interrelated, with 45% of all possible pairwise associations between 96 nongenetic characteristics (n = 4,560 correlations) being statistically-significant at the p < 0.01 level (the ratio of observed to expected statistically-significant associations was 45; p-value for difference between observed and expected < 0.000001). Similar findings were observed for other levels of statistical-significance. In contrast, genetic variants showed no greater association with each other, or with the 96 behavioral, socioeconomic, and physiological factors, than would be expected by chance.

Conclusion: These data illustrate why observational studies have produced misleading claims regarding potentially causal factors for disease. The findings demonstrate the potential power of a methodology that utilizes genetic variants as indicators of exposure level when studying environmentally modifiable risk factors.

In a cross-sectional study Davey Smith and colleagues show why observational studies can produce misleading claims regarding potential causal factors for disease, and illustrate the use of Mendelian Randomization to study environmentally modifiable risk factors.

Editors’ Summary: Background.: Epidemiology is the study of the distribution and causes of human disease. Observational epidemiological studies investigate whether particular modifiable factors (for example, smoking or eating healthily) are associated with the risk of a particular disease. The link between smoking and lung cancer was discovered in this way. Once the modifiable factors associated with a disease are established as causal factors, individuals can reduce their risk of developing that disease by avoiding causative factors or by increasing their exposure to protective factors. Unfortunately, modifiable factors that are associated with risk of a disease in observational studies sometimes turn out not to cause or prevent disease. For example, higher intake of vitamins C and E apparently protected people against heart problems in observational studies, but taking these vitamins did not show any protection against heart disease in randomized controlled trials (studies in which identical groups of patients are randomly assigned various interventions and then their health monitored). One explanation for this type of discrepancy is known as confounding—the distortion of the effect of one factor by the presence of another that is associated both with the exposure under study and with the disease outcome. So in this example, people who took vitamin supplements might have also have exercised more than people who did not take supplements and it could have been the exercise rather than the supplements that was protective against heart disease.

Why Was This Study Done?: It isn’t always possible to check the results of observational studies in randomized controlled trials so epidemiologists have developed other ways to minimize confounding. One approach is known as Mendelian Randomization. Several gene variants have been identified that affect risk factors. For example, variants in a gene called APOE affect the level of cholesterol in an individual’s blood, a risk factor for heart disease. People inherit gene variants randomly from their parents to build up their own unique genotype (total genetic makeup). Consequently, a study that examines the associations between a gene variant and a disease can indicate whether the risk factor affected by that gene variant causes the disease. There should be no confounding in this type of study, the argument goes, because different genetic variants should not be associated with each other or with nongenetic variables that typically confound directly assessed associations between risk factors and disease. But is this true? In this study, the researchers have tested whether nongenetic risk factors are confounded by each other and also whether genetic variants are confounded by nongenetic risk factors and also by other genetic variants

What Did the Researchers Do and Find?: Using data collected in the British Women’s Heart and Health Study, the researchers calculated how many pairs of nongenetic variables (for example, frequency of eating meat, alcohol intake) were statistically-significantly correlated with each other. That is, the number of pairs of nongenetic variables in which a high correlation between both variables occurred in more study participants than expected by chance. They compared this number with the number of correlations that would occur by chance if all the variables were totally independent. When the researchers assumed that 1 in 100 combinations of pairs of variables would have been correlated by chance, the ratio of observed to expected statistically-significant correlations was seen 45× more frequently than would be expected by chance. When the researchers repeated this exercise with genetic variants, the ratio of observed to expected statistically-significant correlations was 1.58, a figure not different from 1. Similarly, the ratio of observed to expected statistically-significant correlations when pairwise combinations between genetic and nongenetic variants were considered was 1.22.

What Do These Findings Mean?: These findings have two main implications. First, the large excess of observed over expected associations among the nongenetic variables indicates that many nongenetic modifiable factors occur in clusters—for example, people with healthy diets often have other healthy habits. Researchers doing observational studies always try to adjust for confounding but this result suggests that this adjustment will be hard to do, in part because it will not always be clear which factors are confounders. Second, the lack of a large excess of observed over expected associations among the genetic variables (and also among genetic variables paired with nongenetic variables) indicates that little confounding is likely to occur in studies that use Mendelian Randomization. In other words, this approach is a valid way to identify which environmentally modifiable risk factors cause human disease.