2012 election predictions
Compiling academic and media forecaster’s 2012 American Presidential election predictions and statistically judging correctness; Nate Silver was not the best.
I statistically analyzed in R hundreds of predictions compiled for ~10 forecasters of the 201213ya American Presidential election, and ranking them by Brier, RMSE, & log scores.
The best overall performance seems to be by Drew Linzer and Wang & Holbrook, while Nate Silver appears as somewhat overrated and the famous Intrade prediction market turned in a disappointing overall performance.
The essay itself lives at “Was Nate Silver the Most Accurate 201213ya Election Pundit?”.
Background
This election prediction judgment divided up into several sections dealing with different categories of predictions:
the overall Presidential race predictions: probability of Obama victory, final electoral vote count, and percentage of popular vote
the Presidential state-by-state predictions: the percentage Obama will take (vote share/
margin/ edge), as well as the probability he will win that state at all the Senate state-by-state predictions: similar, but normalized for the Democratic candidate
Few forecasters made predictions in all categories, the ones who did make predictions did not always make their full predictions public, etc. Note that all percentages are normalized in terms of that going to Obama, Democrats, or in some cases, Independents/
The point of these calculations is to extract Brier scores (for categorical predictions like percentage of Obama victory) and RMSE sums (for continuous/
Presidential
Reality: 0
2008: 0
Wang: 0
Linzer: 0.0001
Jackman: 0.007396
Silver: 0.008281
DeSart: 0.01295044
Margin: 0.1024
Intrade: 0.116964
Random: 0.25 (50% guess is always 0.25)
# To score electorals and populars, we use RMSE
rmse <- function(obs, pred) sqrt(mean((obs-pred)^2,na.rm=TRUE))
rpe <- function(p) rmse(presidential["Reality",]$electoral, presidential[p,]$electoral)
lapply(rownames(presidential), rpe)
Reality: 0
Linzer: 0
Jackman: 0
Putnam: 0
Silver: 19
DeSart: 29
Margin: 29
Wang: 29
2008: 33
Intrade: 41
Unskewed: 69
rpp <- function(p) rmse(presidential["Reality",]$popular, presidential[p,]$popular)
lapply(rownames(presidential)[c(1:9,11)], rpp)
Reality: 0
Wang: 0.31
DeSart: 0.58
Jackman: 0.01
Silver: 0.01
Intrade: 0.04
Margin: 0.71
2008: 2.21
Unskewed: 1.91
State
State Win Probabilities
Reality: 0
Drew Linzer: 0.00384326
Wang/
Ferguson: 0.007615686 Nate Silver: 0.00911372
Simon Jackman: 0.00971369
DeSart/
Holbrook: 0.01605542 Intrade: 0.02811906
2008: 0.03921569
Margin of Error: 0.05075311
random (50%) guesser 0.25000000
Senate
Senate Win Probabilities
senatewin <- read.csv("https://gwern.net/doc/statistics/prediction/election/2012-senatewin.csv", row.names=1)
az ca ct de fl hi indiana me md ma mi
Reality 0.000 1.000 1.000 1.00 1.000 1.00 1.00 1.000 1.00 1.000 1.00
Nate Silver 0.040 1.000 0.960 1.00 1.000 1.00 0.70 0.930 1.00 0.940 1.00
Intrade 0.225 0.998 0.888 0.99 0.859 0.96 0.85 0.957 0.96 0.786 0.95
Wang & Ferguson 0.120 0.950 0.998 0.95 0.950 0.95 0.84 0.950 0.95 0.960 0.96
mn ms mo mt ne nv nj nm ny nd oh pa
Reality 1.00 0.00 1.000 1.000 0.00 0.00 1.00 1.00 1.00 1.000 1.00 1.00
Nate Silver 1.00 0.00 0.980 0.340 0.01 0.17 1.00 0.97 1.00 0.080 0.97 0.99
Intrade 0.95 0.00 0.703 0.371 0.06 0.06 0.96 0.95 1.00 0.155 0.84 0.86
Wang & Ferguson 0.95 0.05 0.960 0.690 0.05 0.27 0.95 0.95 0.95 0.750 0.95 0.95
ri tn tx ut vt va wa wv wi wy
Reality 1.00 0.00 0.000 0.00 0.00 1.00 1.00 1.000 1.000 0.00
Nate Silver 1.00 0.00 0.000 0.00 0.00 0.88 1.00 0.920 0.790 0.00
Intrade 0.99 0.00 0.025 0.00 0.05 0.78 0.96 0.951 0.626 0.00
Wang & Ferguson 0.95 0.05 0.050 0.05 0.05 0.96 0.95 0.950 0.720 0.05
The Senate win predictions (done only by Wang, Silver, & Intrade in this dataset):
brsw <- function (pundit) br(senatewin["Reality",], senatewin[pundit,])
lapply(rownames(senatewin), brsw)
Wang: 0.01246376
Silver: 0.04484545
Intrade: 0.04882958
To combine the state win predictions with the presidency win prediction and also the Senate race win predictions requires data on all 3, so still Wang vs Silver vs Intrade:
combineBinaryForecasts <- function(p) c(statewin[p,],
senatewin[p,],
presidential[p,]$probability)
brpssw <- function(pundit) br(combineBinaryForecasts("Reality"), combineBinaryForecasts(pundit))
lapply(rownames(senatewin), brpssw)
Wang: 0.009408282
Silver: 0.02297625
Intrade: 0.03720485
Log Scores of Win Predictions
logScore <- function(obs, pred) sum(ifelse(obs, log(pred), log(1-pred)), na.rm=TRUE)
Example of the difference between Brier and log score:
# Oops!
brier(0,1,bins=FALSE)$bs
1
# But we can recover by getting the second right
brier(c(0,1),c(1,1),bins=FALSE)$bs
0.5
# Oops!
logScore(1, 0)
-Inf
# Can we recover? ...we're screwed
logScore(c(1,1), c(0,1))
-Inf
Presidency win prediction:
lsp <- function(p) logScore(1, presidential[p,]$probability)
lapply(rownames(presidential), lsp)
Reality: 0
2008: 0
Wang & Ferguson: 0
Linzer: -0.01005034
Jackman: -0.08992471
Silver: -0.09541018
DeSart: -0.1208126
Margin of Error: -0.3856625
Intrade: -0.4185503
Applied to state win predictions:
ls <- function(p) logScore(statewin["Reality",], statewin[p,])
lapply(rownames(statewin), ls)
Reality: 0
Linzer: -0.9327548
Wang & Ferguson: -1.750359
Silver: -2.057887
Jackman: -2.254638
DeSart: -3.30201
Intrade: -5.719922
Margin of Error: -10.20808
2008: -Inf
Now Senate win predictions:
lss <- function(p) logScore(senatewin["Reality",], senatewin[p,])
lapply(rownames(senatewin), lss)
Reality: 0
Wang & Ferguson: -2.89789
Silver: -4.911792
Intrade: -5.813129
And all of them together:
combineBinaryForecasts <- function(p) c(statewin[p,], senatewin[p,], presidential[p,]$probability)
lssp <- function(pundit) logScore(combineBinaryForecasts("Reality"), combineBinaryForecasts(pundit))
lapply(c("Wang & Ferguson", "Nate Silver", "Intrade"), lssp)
Reality: 0
Wang & Ferguson: -4.648249
Silver: -7.06509
Intrade: -11.9516
Summary Tables
RMSEs
Predictor |
Presidential electoral |
Presidential popular |
State margins |
S+Pp+Sm2 |
Senate margins |
---|---|---|---|---|---|
Silver |
19 |
0.01 |
1.81659 |
20.82659 |
3.272197 |
Linzer |
0 |
2.5285 |
|||
Wang |
29 |
0.31 |
2.79083 |
32.10083 |
|
Jackman |
0 |
0.01 |
2.25422 |
2.26422 |
|
DeSart |
29 |
0.58 |
2.414322 |
31.99432 |
|
Intrade |
41 |
0.04 |
|||
2008 |
33 |
2.21 |
3.206457 |
38.41646 |
|
Margin |
29 |
0.71 |
2.426244 |
32.13624 |
|
Putnam |
0 |
2.033683 |
|||
Unskewed |
69 |
1.91 |
7.245104 |
78.1551 |
Brier Scores
(0 is a perfect Brier score or RMSE.)
Predictor |
Presidential win |
State win |
Senate win |
St+Sn+P |
---|---|---|---|---|
Silver |
0.008281 |
0.00911372 |
0.04484545 |
0.02297625 |
Linzer |
0.0001 |
0.00384326 |
||
Wang |
0 |
0.00761569 |
0.01246376 |
0.009408282 |
Jackman |
0.007396 |
0.00971369 |
||
DeSart |
0.012950 |
0.01605542 |
||
Intrade |
0.116964 |
0.02811906 |
0.04882958 |
0.03720485 |
2008 |
0 |
0.03921569 |
||
Margin |
0.1024 |
0.05075311 |
||
Random |
0.2500 |
0.25000000 |
0.25000000 |
0.25000000 |
Log Scores
We mentioned there were other proper scoring rules besides the Brier score; another binary-outcome rule, less used by political forecasters, is the “logarithmic scoring rule” (see Wikipedia or Eliezer Yudkowsky’s “Technical Explanation”); it has some deep connections to areas like information theory, data compression, and Bayesian inference, which makes it invaluable in some context. But because a log score ranges between 0 and negative Infinity (bigger is better/
(One way in which the log score differs from the Brier score is treatment of 100/
Forecaster |
State win probabilities |
---|---|
Reality |
0 |
Linzer |
-0.9327548 |
Wang & Ferguson |
-1.750359 |
Silver |
-2.057887 |
Jackman |
-2.254638 |
DeSart |
-3.30201 |
Intrade |
-5.719922 |
Margin of Error |
-10.20808 |
2008 |
-Infinity |
Forecaster |
Presidential win probability |
---|---|
Reality |
0 |
2008 |
0 |
Wang & Ferguson |
0 |
Jackman |
-0.08992471 |
Linzer |
-0.01005034 |
Silver |
-0.09541018 |
DeSart |
-0.1208126 |
Intrade |
-0.4185503 |
Margin of Error |
-0.3856625 |
Note that the 200817ya benchmark and Wang & Ferguson took a risk here by an outright 100% chance of victory, which the log score rewarded with a 0: if somehow Obama had lost, then the log score of any set of their predictions which included the presidential win probability would automatically be -Infinity, rendering them officially The Worst Predictors In The World. This is why one should allow for the unthinkable by including some fraction of percent; of course, I’m sure Wang & Ferguson don’t mean 100% literally but more like “it’s so close to 100% we can’t be bothered to report the tiny remaining possibility”.
Forecaster |
Senate win probabilities |
---|---|
Reality |
0 |
Wang |
-2.89789 |
Silver |
-4.911792 |
Intrade |
-5.813129 |
See Also
External Links
Some Thoughts on Election Forecasting -(Andrew Gelman, 201015ya)
“Forecasting the 201213ya and 201411ya Elections Using Bayesian Prediction and Optimization”, et al2015
“The Polls - Review; Predicting Elections: Considering Tools to Pool the Polls”, 2015
“Disentangling Bias and Variance in Election Polls”, Shirani-et al2018
-
I have been told that once Intrade prices have been corrected for this, the new results are comparable to Silver & Wang. This doesn’t necessarily surprise me, but during the original analysis I did not look into doing the long-shot bias correction because: hardly anyone does in discussions of prediction markets; it would’ve been more work; I’m not sure it’s really legitimate, since if Intrade is biased, then it’s biased - if someone produces extreme estimates which can be easily improved by regressing to some relevant mean, it doesn’t seem quite honest to present your corrected version instead as what they “really” meant.
-
Summing together RMSEs from different metrics is statistically illegitimate & misleading since the summation will reflect almost entirely the electoral vote performance, since it’s on a scale much bigger than the other metrics. I include it for curiosity only.