In generating a sample of n datapoints drawn from a normal/Gaussian distribution, how big on average the biggest datapoint is will depend on how large n is. I implement a variety of exact & approximate calculations from the literature in R to compare efficiency & accuracy.
If we assembled a sports league of the greatest counterfactual athletes from human history, had they all lived in ideal circumstances, how many would be superior to the greatest actual athlete? Not as many as one might think.
As the total human population ever is ~120b and billions have had considerable opportunity, this league couldn’t be >120 in size, as that is how many would be equal to an extreme actual athlete’s rarity (perhaps 1-in-1b).
If we make distributional assumptions like normality, we can calculate the expected number past a threshold like 1-in-1b for 120 replicates; the number is ~77.