“Calculating The Gaussian Expected Maximum § Probability of Bivariate Maximum”, Gwern2016-01-22 (, , , )⁠:

In generating a sample of n datapoints drawn from a normal/Gaussian distribution, how big on average the biggest datapoint is will depend on how large n is. I implement a variety of exact & approximate calculations from the literature in R to compare efficiency & accuracy.

Given a sample of n pairs of 2 normal variables A & B which are correlated r, what is the probability Pmax that the maximum on the first variable A is also the maximum on the second variable B? This is analogous to many testing or screening situations, such as employee hiring (“what is the probability the top-scoring applicant on the first exam is the top-scorer on the second as well?”) or athletic contests (“what is the probability the current world champ will win the next championship?”).

Order statistics has long proven that asymptotically, Pmax approaches 1⁄n. Exact answers are hard to find, but confirm the asymptotics; the closest that exists is for an approximation & special-case of the Ali-Mikhail-Haq copula: which roughly indicates that r functions as a constant factor boost in Pmax, and the boost from r fades out as n increases.

As long as r ≠ 1, “the tails will come apart”. n increases the difficult too fast for any fixed r to overcome. This has implications for interpreting extremes and test metrics.