“The Wisdom of the Inner Crowd in Three Large Natural Experiments”, Dennie van Dolder, Martijn J. van den Assem2017-12-11 (; backlinks)⁠:

[ACX] The quality of decisions depends on the accuracy of estimates of relevant quantities. According to the wisdom of crowds principle, accurate estimates can be obtained by combining the judgements of different individuals1,2. This principle has been successfully applied to improve, for example, economic forecasts, medical judgements, and meteorological predictions. Unfortunately, there are many situations in which it is infeasible to collect judgements of others.

Recent research proposes that a similar principle applies to repeated judgements from the same person14. This paper tests this promising approach on a large scale in a real-world context. Using proprietary data comprising 1.2 million observations from 3 incentivized guessing competitions, we find that:

within-person aggregation indeed improves accuracy and that the method works better when there is a time delay between subsequent judgements.

However, the benefit pales against that of between-person aggregation: the average of a large number of judgements from the same person is barely better than the average of two judgements from different people.

…We show that within-person aggregation indeed improves accuracy, but not as much as between-person aggregation: the average of a large number of judgements from the same person is barely better than the average of two judgements from different people, even if the advantages of time delay between estimations are being exploited. Our data are from 3 promotional events organized by the Dutch state-owned casino chain Holland Casino. During the last 7 weeks of 2013, 2014 and 2015, anybody who visited one of the casinos received a voucher with a login code. Via a terminal inside the casino and via the Internet, this code granted access to a competition in which participants were asked to estimate the number of objects in a transparent plastic container located just inside the entrance. This container, shaped to represent a champagne glass, was filled with small objects that represented pearls in 2013, pearls and diamonds in 2014 and casino chips in 2015 (Supplementary Figure 1). Both the container and the exact number of objects were the same at every location. There were 12,564 objects in the container in 2013, 23,363 in 2014, and 22,186 in 2015. A prize of €100,000 was shared equally by those whose estimate was closest to the actual value. In 2013, the prize money was awarded to 16 people, and in 2014 and 2015, the entire amount was won by one person. All winners had submitted exactly the right number.

Our pseudonymized data sets contain all entries for the 3 years: a total of 369,260 estimates from 163,719 different players in 2013, 388,352 estimates from 154,790 players in 2014, and 407,622 estimates from 162,275 players in 2015. Many players submitted multiple estimates (Supplementary Figure 2). Across the combined data sets, 60% of the participants were male and the average age was 39 years.

Figure 1: MSE of the inner crowd and the outer crowd as a function of the number of included estimates. The MSE of the inner crowd is shown in black and the outer crowd in dark grey. The graphs also show the MSE of individual consecutive estimates (light grey). The upper graphs use the estimates of players who submitted at least k = 5 estimates in a given year, and the bottom graphs use the estimates of players who submitted at least k = 10 estimates in a given year. The curve for the inner crowd represents the best-fitting hyperbolic function MSE = (a⁄t) + b (using nonlinear least squares); the dotted line represents b. Values for the outer crowd are mathematically determined using the diversity prediction theorem (see Methods); the dashed line represents the limit as the number of included estimates goes to infinity. Error bars represent 95% confidence intervals. N is the number of players.
Figure 1: MSE of the inner crowd and the outer crowd as a function of the number of included estimates. The MSE of the inner crowd is shown in black and the outer crowd in dark grey. The graphs also show the MSE of individual consecutive estimates (light grey). The upper graphs use the estimates of players who submitted at least k = 5 estimates in a given year, and the bottom graphs use the estimates of players who submitted at least k = 10 estimates in a given year. The curve for the inner crowd represents the best-fitting hyperbolic function MSE = (at) + b (using nonlinear least squares); the dotted line represents b. Values for the outer crowd are mathematically determined using the diversity prediction theorem (see Method); the dashed line represents the limit as the number of included estimates goes to infinity. Error bars represent 95% confidence intervals. N is the number of players.

…In conclusion, the present study finds that the effectiveness of within-person aggregation is considerably lower than that of between-person aggregation: the average of a large number of judgements from the same person is barely better than the average of two judgements from different people. The efficacy difference is a consequence of the existence of individual-level systematic errors (idiosyncratic bias). The effect of these errors can be eliminated by combining estimates from multiple people, not by combining multiple estimates from a single person.

[Interesting question: given language models’ ability to simulate many different people (Park et al 2022, Aher et al 2022 such as age groups), can they ‘inner crowd’ with far greater accuracy than humans can?]