“How Correlated Are You?”, Allen Downey2023-08-20 (, ; backlinks)⁠:

Suppose you measure the arm and leg lengths of 4,082 people. You would expect those measurements to be correlated, and you would be right. In the ANSUR-II dataset, among male members of the armed forces, this correlation is about 0.75—people with long arms tend to have long legs.

…So some pairs of measurements are more correlated than others. There are a total of 93 measurements in the ANSUR-II dataset, which means there are 93 × 92 = 8,556 correlations between pairs of measurements. So here’s a question that caught my attention: Are there measurements that are uncorrelated (or only weakly correlated) with the others?

To answer that, I computed the average magnitude (positive or negative) of the correlation between each measurement and the other 92. The most correlated measurement is weight, with an average of 0.56. So if you have to choose one measurement, weight seems to provide the most information about all of the others.

The least correlated measurement turns out to be ear protrusion—its average correlation with the other measurements is only 0.03, which is not just small, it is substantially smaller than the next smallest, which is ear breadth, with an average correlation of 0.13. [Winner’s curse + measurement error?]

Beyond the averages

We can get a better sense of what’s going on by looking at the distribution of correlations for each measurement, rather than just the averages. I’ll use my two favorite data visualization tools: CDFs, which make it easy to identify outliers, and spaghetti plots, which make it easy to spot oddities.

This figure shows the CDF of correlations for each of the 93 measurements.