“Against Caring About Subtle Poisons”, Gwern2023-09-30 (, , ; similar)⁠:

The question of ’what is the prior probability of a new causal claim in epidemiology/nutrition (or correlation-heavy fields in general) being roughly correct for my decision-making?’ is a natural one.

So you’ll be disappointed to hear that for the most part… they have no idea. They just don’t know. Claims and theories and interventions come and go, and some stick, and the vast bulk get forgotten—often for no visible reason (any more than you can explain why, exactly, last season’s clothes fashion is now unfashionable). In most cases, the causal claims never get definitively tested and simply fade away, becoming a forgotten fad. It’s quite unusual for any ‘subtle poison’ paper to be tested by some large-scale randomized experiment in humans which can rule out all decision-relevant effect sizes. You have to have something very popular, like multivitamins, before they get attacked enough to prove that, eg. Vitamin C & multivitamins are useless & all the evidence was either irrelevant or confounded.

Still, what one can piece together statistically suggests that the prior is less than 50%. Much less.


You can think of it as having two steps in a short pipeline: first, how often does a correlation (or causation) imply a correlation (or causation)? This is the standard Replication Crisis sort of result, and you can get out some reasonable summaries from things like Many Labs, to the effect that a large fraction of results simply do not replicate and the effect sizes will shrink by a large fraction when they do. This is the easy step, as you’re simply asking whether the published result even repeats when redone. Obviously, if it doesn’t, and disappears, you no longer need to care about it. This is an upper bound, and it’s already a dire one.

The second step is, usually, the published result is not about what you care about, and there are multiple large leaps from the actual claim that the raw data justifies, to any kind of decision you might make: if you have some randomized causal result in mice, which is definitely 100% there in mice and the result replicates as many times as you want, you still don’t care about mice, but about humans. Or if you have a correlational result in humans, it can replicate fine and yet meaningless, because the causation runs the wrong way. This is the hard step, because the second part is usually unobtainable—if you could have obtained the result you cared about easily directly, you wouldn’t’ve been bothering with the first step! They run these bulls—t studies with giant doses of poisons on yeast in a petri dish because we can’t (or rather, won’t) randomize a million humans to measure all-cause mortality directly. So, it’s unsurprising if there are few results of the form ‘we had 100 hits in petri dishes and 6 of them worked out in humans, so you can ignore any headline you see about petri-dish work as it has a probability of only ~6% of being something you should care about’.

You can look at Vinay Prasad’s Ending Medical Reversal for one way to try to measure this sort of thing. (Similar to failures to replicate, if doctors think X at one time and not-X at another, they can’t both be right.)

I once tried to make a bibliography of studies measuring the concordance between correlation & causal results on the rare occasion that such comparisons could be done. The results are hard to summarize but not encouraging; you’d probably be most interested in the NICE ones.

Animal clinical & toxicological studies are one of the few areas that you can really be systematic about this because of the later clinical trials in humans, and what systematic reviews & meta-analyses are available suggest that the predictive validity of in vitro & animal experiments is worse than even ‘in mice!’ jokes imply (some links).

You can also look at just pure data mining of correlations in datasets large enough that correlations are not false positives—because “everything is correlated”, if you have any sort of reasonable belief on how causality works, finding a correlation is such a common ordinary thing that it cannot represent much evidence for a very specific causal relationship. (Per Meehl: if everything is either positively or negatively correlated with 50:50 odds, and your cool new causal theory predicts that A & B are positively correlated, and they are, then the theory has done no better than predict the outcome of one coin-flip—which is hardly any evidence at all, no matter how many newspapers trumpet it in headlines.)

And then even if you do find something useful, most such claims are weak and of little value to the individual. Epidemiology has already gotten the big wins and the low-hanging fruit of “drinking water that won’t kill you” or “vaccines” or “maybe you shouldn’t smoke”.

All that’s left is small potato epidemiological claims. The “population attributable fraction” of any such association is usually small; when this comes up, usually the public health defense is that it’s big in absolute terms across the whole global population indefinitely or it’s so cheap that it’s cost-effective so you might as well do it—not that it’ll make life expectancy go up 10 years. That’s not going to happen. Not even with breakthroughs like semaglutide. (See also “Epidemiology, genetics and the ‘Gloomy Prospect’: embracing randomness in population health research and practice”, Smith2011.)


Personally, after years of reading methodology papers & meta-analyses etc, I’ve pretty much given up on the ‘subtle poison’ genre of science fiction entirely and choose my food based on more pragmatic criteria, and try to ignore most such research unless it’s unusually interesting.

There are undoubtedly truths of the matter, which matter. There are doubtless many subtle poisons we are eating or breathing, and which we would prefer not to. We could probably boost longevity, or at least reduce many diseases and improve quality of life, by a lot, maybe as much as decades, if we knew them all and could fix them.

But we don’t know them, and we won’t find many (much less most) of them until methods improve to the point where it’s easier to do the right things than the wrong things. (In the same way that the candidate-gene genetics era of ~100% false results was replaced by the GWAS era of real results only because genome sequencing got so insanely cheap that researchers could do the right thing analyzing GWAS PGSes & datasets like UK Biobank as easily as the wrong thing, rather than any sort of moral awakening about p-hacking.)

I’m not too optimistic about that happening anytime soon. Thus, for now, it’s just an unusually boring & expensive genre of science fiction.