Compiling academic and media forecaster’s 2012 American Presidential election predictions and statistically judging correctness; Nate Silver was not the best.
I statistically analyzed in R hundreds of predictions compiled for ~10 forecasters of the 201212ya American Presidential election, and ranking them by Brier, RMSE, & log scores.
The best overall performance seems to be by Drew Linzer and Wang & Holbrook, while Nate Silver appears as somewhat overrated and the famous Intrade prediction market turned in a disappointing overall performance.
In November 201212ya, I was hired by CFAR to compile an extensive dataset of pundits, modelers, hobbyists, and academics who had attempted to statistically forecast the 201212ya American presidential race and other minor races; the results were interesting in that they contradicted the lionization of Nate Silver’s forecasts in The New York Times. This page is a full listing of the R source code I used to produce my analysis for the CFAR essay; notes on the derivation of each dataset are stored at 2012-gwern-notes.txt
.
The essay itself lives at “Was Nate Silver the Most Accurate 201212ya Election Pundit?”.
Background
This election prediction judgment divided up into several sections dealing with different categories of predictions:
-
the overall Presidential race predictions: probability of Obama victory, final electoral vote count, and percentage of popular vote
-
the Presidential state-by-state predictions: the percentage Obama will take (vote share/margin/edge), as well as the probability he will win that state at all
-
the Senate state-by-state predictions: similar, but normalized for the Democratic candidate
Few forecasters made predictions in all categories, the ones who did make predictions did not always make their full predictions public, etc. Note that all percentages are normalized in terms of that going to Obama, Democrats, or in some cases, Independents/Greens. The “Reality” ‘forecaster’ is the ground truth; these were all updated 23 November in what is hopefully a final update.
The point of these calculations is to extract Brier scores (for categorical predictions like percentage of Obama victory) and RMSE sums (for continuous/quantitative predictions like vote share). Intrade prices were interpreted as straightforward probabilities without any correction for Intrade’s long-shot bias1
Presidential
presidential <- read.csv("https://gwern.net/doc/statistics/prediction/election/2012-presidential.csv", row.names=1)
# Reality=2012 result; 2008=2008 results
presidential
probability electoral popular
Reality 1.0000 332 50.79
2008 1.0000 365 53.00
Nate Silver 0.9090 313 50.80
Drew Linzer 0.9900 332 NA
Simon Jackman 0.9140 332 50.80
DeSart 0.8862 303 51.37
Margin of Error 0.6800 303 51.50
Wang & Ferguson 1.0000 303 51.10
Intrade 0.6580 291 50.75
Josh Putnam NA 332 NA
Unskewed Polls NA 263 48.88
# probability can be scored as a Brier score; available in 'verification' library
install.packages("verification")
library(verification)
# handle lists & vectors for later
br <- function(obs, pred) brier(unlist(obs),
unlist(pred),
bins=FALSE)$bs # bins=FALSE avoids rounding
# convenience function
brp <- function(p) brier(presidential["Reality",]$probability,
presidential[p,]$probability,
bins=FALSE)$bs
lapply(rownames(presidential)[1:9], brp)
-
Reality: 0
-
2008: 0
-
Wang: 0
-
Linzer: 0.0001
-
Jackman: 0.007396
-
Silver: 0.008281
-
DeSart: 0.01295044
-
Margin: 0.1024
-
Intrade: 0.116964
-
Random: 0.25 (50% guess is always 0.25)
# To score electorals and populars, we use RMSE
rmse <- function(obs, pred) sqrt(mean((obs-pred)^2,na.rm=TRUE))
rpe <- function(p) rmse(presidential["Reality",]$electoral, presidential[p,]$electoral)
lapply(rownames(presidential), rpe)
-
Reality: 0
-
Linzer: 0
-
Jackman: 0
-
Putnam: 0
-
Silver: 19
-
DeSart: 29
-
Margin: 29
-
Wang: 29
-
2008: 33
-
Intrade: 41
-
Unskewed: 69
rpp <- function(p) rmse(presidential["Reality",]$popular, presidential[p,]$popular)
lapply(rownames(presidential)[c(1:9,11)], rpp)
-
Reality: 0
-
Wang: 0.31
-
DeSart: 0.58
-
Jackman: 0.01
-
Silver: 0.01
-
Intrade: 0.04
-
Margin: 0.71
-
2008: 2.21
-
Unskewed: 1.91
State
State Win Probabilities
# Reality=final 2012 result - 0 for Romney states, 100 for Obama
# 2008=2008 state results (=Reality, negated for Obama loss of Indiana & North Carolina)
statewin <- read.csv("https://gwern.net/doc/statistics/prediction/election/2012-statewin.csv", row.names=1)
statewin
al ak az ar ca co ct de
Reality 0.0000 0.000000 0.0000 0.0000 1.0000 1.00000 1.0000 1.00000
2008 0.0000 0.000000 0.0000 0.0000 1.0000 1.00000 1.0000 1.00000
Nate Silver 0.0000 0.000000 0.0200 0.0000 1.0000 0.80000 1.0000 1.00000
Drew Linzer 0.0000 0.000086 0.0000 0.0000 1.0000 0.98333 1.0000 0.98333
Margin of Error 0.0269 0.099800 0.4388 0.0451 0.9443 0.64710 0.9125 0.93770
Intrade 0.0000 0.000000 0.0600 0.0000 0.9500 0.55600 0.9900 0.96000
DeSart 0.0000 0.090000 0.0390 0.0000 1.0000 0.52300 0.9990 1.00000
Simon Jackman 0.0052 0.000000 0.0050 0.0000 1.0000 0.76520 1.0000 1.00000
Wang & Ferguson 0.0000 0.000000 0.0000 0.0000 1.0000 0.84000 1.0000 1.00000
Josh Putnam NA NA NA NA NA NA NA NA
Unskewed Polls NA NA NA NA NA NA NA NA
dc fl ga hi id il indiana ia ks
Reality 1.000 1.0000 0.0000 1.0000 0.0000 1.0000 0.0000 1.0000 0.00000
2008 1.000 1.0000 0.0000 1.0000 0.0000 1.0000 1.0000 1.0000 0.00000
Nate Silver 1.000 0.5000 0.0000 1.0000 0.0000 1.0000 0.0000 0.8400 0.00000
Drew Linzer NA 0.6040 0.0000 1.0000 0.0000 1.0000 0.0000 0.9966 0.03866
Margin of Error 1.000 0.4575 0.1972 0.9987 0.0086 0.9569 0.3273 0.6467 0.09710
Intrade 0.975 0.3300 0.0300 0.9750 0.0000 0.9890 0.0200 0.6630 0.00000
DeSart 1.000 0.4910 0.0300 1.0000 0.0000 1.0000 0.0020 0.7700 0.00000
Simon Jackman 1.000 0.5216 0.0014 1.0000 0.0000 1.0000 0.0000 0.8376 0.00000
Wang & Ferguson 1.000 0.5000 0.0000 1.0000 0.0000 1.0000 0.0000 0.8400 0.00000
Josh Putnam NA NA NA NA NA NA NA NA NA
Unskewed Polls NA NA NA NA NA NA NA NA NA
ky la me md ma mi mn ms
Reality 0.000000 0.0000 1.0000 1.00000 1.0000 1.0000 1.0000 0.00000000
2008 0.000000 0.0000 1.0000 1.00000 1.0000 1.0000 1.0000 0.00000000
Nate Silver 0.000000 0.0000 1.0000 1.00000 1.0000 0.9900 1.0000 0.00000000
Drew Linzer 0.000000 0.0000 1.0000 1.00000 1.0000 1.0000 1.0000 0.07866667
Margin of Error 0.019100 0.0442 0.8403 0.96837 0.8988 0.6837 0.7149 0.13470000
Intrade 0.000000 0.0000 0.9300 0.94000 0.9950 0.8840 0.8490 0.00000000
DeSart 0.000000 0.0000 0.9930 1.00000 1.0000 0.9350 0.9610 0.00000000
Simon Jackman 0.000004 0.0000 1.0000 1.00000 1.0000 0.9998 0.9992 0.00000000
Wang & Ferguson 0.000000 0.0200 1.0000 1.00000 1.0000 1.0000 1.0000 0.00000000
Josh Putnam NA NA NA NA NA NA NA NA
Unskewed Polls NA NA NA NA NA NA NA NA
mo mt ne nv nh nj nm ny
Reality 0.000000 0.0000 0.0000 1.0000000 1.0000 1.0000 1.0000 1.0000
2008 0.000000 0.0000 0.0000 1.0000000 1.0000 1.0000 1.0000 1.0000
Nate Silver 0.000000 0.0200 0.0000 0.9300000 0.8500 1.0000 0.9900 1.0000
Drew Linzer 0.000000 0.0000 0.0000 0.9993333 0.9980 1.0000 1.0000 1.0000
Margin of Error 0.447300 0.2436 0.0562 0.7710000 0.6886 0.8647 0.8579 0.9697
Intrade 0.050000 0.0500 0.0000 0.8370000 0.6490 0.9790 0.9390 0.9500
DeSart 0.052000 0.0080 0.0000 0.7680000 0.7560 0.9980 0.9740 1.0000
Simon Jackman 0.000004 0.0032 0.0000 0.9120000 0.8324 0.9998 0.9968 1.0000
Wang & Ferguson 0.000000 0.0000 0.0000 0.9900000 0.8400 1.0000 1.0000 1.0000
Josh Putnam NA NA NA NA NA NA NA NA
Unskewed Polls NA NA NA NA NA NA NA NA
nc nd oh ok or pa ri
Reality 0.00000000 0.0000 1.0000000 0.0000 1.0000000 1.0000 1.0000
2008 1.00000000 0.0000 1.0000000 0.0000 1.0000000 1.0000 1.0000
Nate Silver 0.26000000 0.0000 0.9100000 0.0000 1.0000000 0.9900 1.0000
Drew Linzer 0.08533333 0.0000 0.9986667 0.0000 0.9986667 1.0000 1.0000
Margin of Error 0.50030000 0.1284 0.6038000 0.0029 0.7886000 0.7562 0.9684
Intrade 0.23000000 0.0030 0.6550000 0.0010 0.9590000 0.8200 0.9500
DeSart 0.06600000 0.0000 0.7040000 0.0000 0.9430000 0.8810 1.0000
Simon Jackman 0.28120000 0.0000 0.9298000 0.0000 0.9726000 0.9910 1.0000
Wang & Ferguson 0.16000000 0.0000 0.9300000 0.0000 1.0000000 0.9300 1.0000
Josh Putnam NA NA NA NA NA NA NA
Unskewed Polls NA NA NA NA NA NA NA
sc sd tn tx ut vt va wa
Reality 0.0000000 0.0000 0.000000 0.0000 0.0000 1.0000 1.0000 1.0000
2008 0.0000000 0.0000 0.000000 0.0000 0.0000 1.0000 1.0000 1.0000
Nate Silver 0.0000000 0.0000 0.000000 0.0000 0.0000 1.0000 0.7900 1.0000
Drew Linzer 0.1386667 0.0000 0.000000 0.0000 0.0000 1.0000 0.9760 1.0000
Margin of Error 0.1345000 0.1665 0.053100 0.0545 0.0035 0.9846 0.5046 0.8473
Intrade 0.0400000 0.0500 0.020000 0.0200 0.0450 0.9800 0.5800 0.9750
DeSart 0.0030000 0.0010 0.000000 0.0000 0.0000 1.0000 1.0000 0.9980
Simon Jackman 0.1290000 0.0068 0.000004 0.0000 0.0000 1.0000 0.7840 1.0000
Wang & Ferguson 0.0000000 0.0000 0.000000 0.0000 0.0000 1.0000 0.8400 1.0000
Josh Putnam NA NA NA NA NA NA NA NA
Unskewed Polls NA NA NA NA NA NA NA NA
wv wi wy
Reality 0.000000000 1.0000 0.000000000
2008 0.000000000 1.0000 0.000000000
Nate Silver 0.000000000 0.9700 0.000000000
Drew Linzer 0.001333333 1.0000 0.000666667
Margin of Error 0.042700000 0.6448 0.006900000
Intrade 0.020000000 0.7460 0.000000000
DeSart 0.000000000 0.8560 0.000000000
Simon Jackman 0.005400000 0.9698 0.000000000
Wang & Ferguson 0.000000000 0.9900 0.000000000
Josh Putnam NA NA NA
Unskewed Polls NA NA NA
brstate <- function(p) br(statewin["Reality",], statewin[p,])
lapply(rownames(statewin)[1:9], brstate)
-
Reality: 0
-
Drew Linzer: 0.00384326
-
Wang/Ferguson: 0.007615686
-
Nate Silver: 0.00911372
-
Simon Jackman: 0.00971369
-
DeSart/Holbrook: 0.01605542
-
Intrade: 0.02811906
-
2008: 0.03921569
-
Margin of Error: 0.05075311
-
random (50%) guesser 0.25000000
Senate
Senate Win Probabilities
senatewin <- read.csv("https://gwern.net/doc/statistics/prediction/election/2012-senatewin.csv", row.names=1)
az ca ct de fl hi indiana me md ma mi
Reality 0.000 1.000 1.000 1.00 1.000 1.00 1.00 1.000 1.00 1.000 1.00
Nate Silver 0.040 1.000 0.960 1.00 1.000 1.00 0.70 0.930 1.00 0.940 1.00
Intrade 0.225 0.998 0.888 0.99 0.859 0.96 0.85 0.957 0.96 0.786 0.95
Wang & Ferguson 0.120 0.950 0.998 0.95 0.950 0.95 0.84 0.950 0.95 0.960 0.96
mn ms mo mt ne nv nj nm ny nd oh pa
Reality 1.00 0.00 1.000 1.000 0.00 0.00 1.00 1.00 1.00 1.000 1.00 1.00
Nate Silver 1.00 0.00 0.980 0.340 0.01 0.17 1.00 0.97 1.00 0.080 0.97 0.99
Intrade 0.95 0.00 0.703 0.371 0.06 0.06 0.96 0.95 1.00 0.155 0.84 0.86
Wang & Ferguson 0.95 0.05 0.960 0.690 0.05 0.27 0.95 0.95 0.95 0.750 0.95 0.95
ri tn tx ut vt va wa wv wi wy
Reality 1.00 0.00 0.000 0.00 0.00 1.00 1.00 1.000 1.000 0.00
Nate Silver 1.00 0.00 0.000 0.00 0.00 0.88 1.00 0.920 0.790 0.00
Intrade 0.99 0.00 0.025 0.00 0.05 0.78 0.96 0.951 0.626 0.00
Wang & Ferguson 0.95 0.05 0.050 0.05 0.05 0.96 0.95 0.950 0.720 0.05
The Senate win predictions (done only by Wang, Silver, & Intrade in this dataset):
brsw <- function (pundit) br(senatewin["Reality",], senatewin[pundit,])
lapply(rownames(senatewin), brsw)
-
Wang: 0.01246376
-
Silver: 0.04484545
-
Intrade: 0.04882958
To combine the state win predictions with the presidency win prediction and also the Senate race win predictions requires data on all 3, so still Wang vs Silver vs Intrade:
combineBinaryForecasts <- function(p) c(statewin[p,],
senatewin[p,],
presidential[p,]$probability)
brpssw <- function(pundit) br(combineBinaryForecasts("Reality"), combineBinaryForecasts(pundit))
lapply(rownames(senatewin), brpssw)
-
Wang: 0.009408282
-
Silver: 0.02297625
-
Intrade: 0.03720485
Log Scores of Win Predictions
logScore <- function(obs, pred) sum(ifelse(obs, log(pred), log(1-pred)), na.rm=TRUE)
Example of the difference between Brier and log score:
# Oops!
brier(0,1,bins=FALSE)$bs
1
# But we can recover by getting the second right
brier(c(0,1),c(1,1),bins=FALSE)$bs
0.5
# Oops!
logScore(1, 0)
-Inf
# Can we recover? ...we're screwed
logScore(c(1,1), c(0,1))
-Inf
Presidency win prediction:
lsp <- function(p) logScore(1, presidential[p,]$probability)
lapply(rownames(presidential), lsp)
-
Reality: 0
-
2008: 0
-
Wang & Ferguson: 0
-
Linzer: -0.01005034
-
Jackman: -0.08992471
-
Silver: -0.09541018
-
DeSart: -0.1208126
-
Margin of Error: -0.3856625
-
Intrade: -0.4185503
Applied to state win predictions:
ls <- function(p) logScore(statewin["Reality",], statewin[p,])
lapply(rownames(statewin), ls)
-
Reality: 0
-
Linzer: -0.9327548
-
Wang & Ferguson: -1.750359
-
Silver: -2.057887
-
Jackman: -2.254638
-
DeSart: -3.30201
-
Intrade: -5.719922
-
Margin of Error: -10.20808
-
2008: -Inf
Now Senate win predictions:
lss <- function(p) logScore(senatewin["Reality",], senatewin[p,])
lapply(rownames(senatewin), lss)
-
Reality: 0
-
Wang & Ferguson: -2.89789
-
Silver: -4.911792
-
Intrade: -5.813129
And all of them together:
combineBinaryForecasts <- function(p) c(statewin[p,], senatewin[p,], presidential[p,]$probability)
lssp <- function(pundit) logScore(combineBinaryForecasts("Reality"), combineBinaryForecasts(pundit))
lapply(c("Wang & Ferguson", "Nate Silver", "Intrade"), lssp)
-
Reality: 0
-
Wang & Ferguson: -4.648249
-
Silver: -7.06509
-
Intrade: -11.9516
Summary Tables
RMSEs
Predictor |
Presidential electoral |
Presidential popular |
State margins |
S+Pp+Sm2 |
Senate margins |
---|---|---|---|---|---|
Silver |
19 |
0.01 |
1.81659 |
20.82659 |
3.272197 |
Linzer |
0 |
2.5285 |
|||
Wang |
29 |
0.31 |
2.79083 |
32.10083 |
|
Jackman |
0 |
0.01 |
2.25422 |
2.26422 |
|
DeSart |
29 |
0.58 |
2.414322 |
31.99432 |
|
Intrade |
41 |
0.04 |
|||
2008 |
33 |
2.21 |
3.206457 |
38.41646 |
|
Margin |
29 |
0.71 |
2.426244 |
32.13624 |
|
Putnam |
0 |
2.033683 |
|||
Unskewed |
69 |
1.91 |
7.245104 |
78.1551 |
Brier Scores
(0 is a perfect Brier score or RMSE.)
Predictor |
Presidential win |
State win |
Senate win |
St+Sn+P |
---|---|---|---|---|
Silver |
0.008281 |
0.00911372 |
0.04484545 |
0.02297625 |
Linzer |
0.0001 |
0.00384326 |
||
Wang |
0 |
0.00761569 |
0.01246376 |
0.009408282 |
Jackman |
0.007396 |
0.00971369 |
||
DeSart |
0.012950 |
0.01605542 |
||
Intrade |
0.116964 |
0.02811906 |
0.04882958 |
0.03720485 |
2008 |
0 |
0.03921569 |
||
Margin |
0.1024 |
0.05075311 |
||
Random |
0.2500 |
0.25000000 |
0.25000000 |
0.25000000 |
Log Scores
We mentioned there were other proper scoring rules besides the Brier score; another binary-outcome rule, less used by political forecasters, is the “logarithmic scoring rule” (see Wikipedia or Eliezer Yudkowsky’s “Technical Explanation”); it has some deep connections to areas like information theory, data compression, and Bayesian inference, which makes it invaluable in some context. But because a log score ranges between 0 and negative Infinity (bigger is better/smaller worse) rather than 0 and 1 (smaller better) and has some different behaviors, it’s a bit harder to understand than a Brier score.
(One way in which the log score differs from the Brier score is treatment of 100/0% predictions: the log score of a 100% prediction which is wrong is negative Infinity, while in Brier it’d simply be 1 and one can recover; hence if you say 100% twice and are wrong once, your Brier score would recover to 0.5 but your log score will still be negative Infinity! This is what happens with the “2008” benchmark.)
Forecaster |
State win probabilities |
---|---|
Reality |
0 |
Linzer |
-0.9327548 |
Wang & Ferguson |
-1.750359 |
Silver |
-2.057887 |
Jackman |
-2.254638 |
DeSart |
-3.30201 |
Intrade |
-5.719922 |
Margin of Error |
-10.20808 |
2008 |
-Infinity |
Forecaster |
Presidential win probability |
---|---|
Reality |
0 |
2008 |
0 |
Wang & Ferguson |
0 |
Jackman |
-0.08992471 |
Linzer |
-0.01005034 |
Silver |
-0.09541018 |
DeSart |
-0.1208126 |
Intrade |
-0.4185503 |
Margin of Error |
-0.3856625 |
Note that the 200816ya benchmark and Wang & Ferguson took a risk here by an outright 100% chance of victory, which the log score rewarded with a 0: if somehow Obama had lost, then the log score of any set of their predictions which included the presidential win probability would automatically be -Infinity, rendering them officially The Worst Predictors In The World. This is why one should allow for the unthinkable by including some fraction of percent; of course, I’m sure Wang & Ferguson don’t mean 100% literally but more like “it’s so close to 100% we can’t be bothered to report the tiny remaining possibility”.
Forecaster |
Senate win probabilities |
---|---|
Reality |
0 |
Wang |
-2.89789 |
Silver |
-4.911792 |
Intrade |
-5.813129 |
See Also
External Links
-
Some Thoughts on Election Forecasting -(Andrew Gelman, 201014ya)
-
“Forecasting the 201212ya and 2014 Elections Using Bayesian Prediction and Optimization”, et al 2015
-
“The Polls - Review; Predicting Elections: Considering Tools to Pool the Polls”, 2015
-
“Disentangling Bias and Variance in Election Polls”, Shirani-et al 2018
-
I have been told that once Intrade prices have been corrected for this, the new results are comparable to Silver & Wang. This doesn’t necessarily surprise me, but during the original analysis I did not look into doing the long-shot bias correction because: hardly anyone does in discussions of prediction markets; it would’ve been more work; I’m not sure it’s really legitimate, since if Intrade is biased, then it’s biased - if someone produces extreme estimates which can be easily improved by regressing to some relevant mean, it doesn’t seem quite honest to present your corrected version instead as what they “really” meant.↩︎
-
Summing together RMSEs from different metrics is statistically illegitimate & misleading since the summation will reflect almost entirely the electoral vote performance, since it’s on a scale much bigger than the other metrics. I include it for curiosity only.↩︎