“Crowd Prediction Systems: Markets, Polls, and Elite Forecasters”, 2024-01-22 ():
What systems should we use to elicit and aggregate judgmental forecasts? Who should be asked to make such forecasts? We address these questions by assessing two widely used crowd prediction systems: prediction markets and prediction polls. Our main test compares a prediction market against team-based prediction polls, using data from a large, multi-year forecasting competition. Each of these two systems uses inputs from either a large, sub-elite or a small, elite crowd. We find that small, elite crowds outperform larger ones, whereas the two systems are statistically tied.
In addition to this main research question, we examine two complementary questions. First, we compare two market structures—continuous double auction (CDA) markets and logarithmic market scoring rule (LMSR) markets—and find that the LMSR market produces more accurate forecasts than the CDA market, especially on low-activity questions. Second, given the importance of elite forecasters, we compare the talent-spotting properties of the two systems and find that markets and polls are equally effective at identifying elite forecasters.
Overall, the performance benefits of “superforecasting” hold across systems. Managers should move towards identifying and deploying small, select crowds to maximize forecasting performance.
…We are the first to compare the aggregate performance of small, elite forecaster crowds across two prediction systems: logarithmic market scoring rule (LMSR) prediction markets and team prediction polls. Moreover, we compare the aggregate accuracy of elite forecaster crowds to larger, sub-elite crowds using the same prediction systems. The comparison of elite crowds is notable because such a study relies on the resource-intensive process of identifying elite forecasters: it involves engaging with thousands of forecasters reporting on hundreds of questions over multiple years. Studies involving fewer forecasters may need to set lower thresholds for elite status, while studies with fewer questions per season may identify top performers less reliably. This raises the question whether forming elite forecaster crowds is worth the effort. Our results show that the benefits of employing elite crowds are large and robust across prediction polls and prediction markets, despite the 3× size advantage of sub-elite crowds. Moreover, the advantages of elite over sub-elite crowds are substantially larger than the differences between prediction markets and prediction polls.
In addition to this primary study, we also report on two additional studies, each of which complements the findings of the primary study along a different dimension. In the first, we provide an experimental evaluation of two popular types of prediction market architectures: CDA and LMSR markets. To the best of our knowledge, we are the first to study these methods in a large, randomized experiment. Prior research reporting on CDA and LMSR market performance did not compare the two designs directly but had separate sets of questions for each (2015). Using data from over 1,300 forecasters and a total of 147 questions, we find that the LMSR market achieves higher accuracy than the CDA market. We find that the outperformance by the LMSR market appears particularly pronounced for questions that attracted few traders or soon after a question was posted, when only few traders had placed orders. Both of these correspond to thin markets, and our analyses are hence in line with Hanson’s (2003, 2007) main motivation for the design of the LMSR market architecture.