Friday, November 9, 2012

How Did Nate Silver Get The Election Odds So Wrong?

Something seems to be fishy with superstar statistician Nate Silver's predictions for the 2012 US elections.

Nate Silver, founder and chief blogger at fivethirtyeight is an amazing political analyst who appears to have the ability to predict election outcomes almost perfectly. Despite the praise being heaped upon him by so many people who, like me, spent hours and hours reading his posts and studying his plots, I have to tell you I'm shocked at just how wrong his probabilities seemingly turned out to have been in the end.

Bear in mind that I'm not saying Silver was wrong about who would win the election. If anything he was more right than his own numbers said he should have been. And being more right than he should be means there's something odd, or interesting, about his statistics.

The point that many people seem to be missing is that Silver was not simply predicting who would win in each state. He was publishing the odds that one or the other candidate would win in each statewide race. That's an important difference. It's precisely this data, which Silver presented so clearly and blogged about so eloquently, that makes it easy to check on how well he actually did. Unfortunately, these very numbers also suggest that his model most likely blew it by paradoxically underestimating the odds of President Obama's reelection while at the same time correctly predicting the outcomes of 82 of 83 contests (50 state presidential tallies and 32 of 33 Senate races).

Look at it this way, if a meteorologist says there a 90% chance of rain where you live and it doesn't rain, the forecast wasn't necessarily wrong, because 10% of the time it shouldn't rain - otherwise the odds would be something other than a 90% chance of rain. One way a meteorologist could be wrong, however, is by using a predictive model that consistently gives the incorrect probabilities of rain. Only by looking a the odds the meteorologist gave and comparing them to actual data could you tell in hindsight if there was something fishy with the prediction.

It's easy enough to check how good a series of weather predictions were. First, consider the percentage chance that it will rain on any given day, let's say it's a number like 79.7%. It will probably rain on that day, but it wouldn't be too shocking if it didn't because there's a 20.3% chance that it won't. If we record the predictions of rain for seven days in a row, we can calculate that odds that it would have rained every one of those days by multiplying the chances of rain together. (It makes the math simpler if we express the percentage as a decimal number like this 0.797 rather than 79.7%, although it really means the same thing.) Consider these forecasts of rain for one week.

Monday          0.846
Tuesday          0.794
Wednesday     0.843
Thursday        0.906
Friday             0.797
Saturday         0.967
Sunday           0.503

The probability that it rained every day in this hypothetical week is

 P= 0.846 * 0.794 * 0.843*0.906*0.797*0.967*0.503

(Where '*' is the symbol for multiplication.)

The result is P= 0.199, which means there's a 19.9% chance that it rained every day that week. In other words, there's an 80.1% chance it didn't rain on at least one day of the week. If it did in fact rain everyday, you could say it was the result of a little bit of luck. After all, 19.9% isn't that small a chance of something happening.

We only have one 2012 general election, of course, but it's made up of many separate elections. So instead of looking at rain on consecutive days, we can consider separate elections that all happened on the same day, but in different states. Either way, it's mathematically the same.

By the way, the odds of rain I typed in above are the very same odds Nate Silver quoted when predicting the chances of Obama winning in seven of the battleground states in the 2012 presidential election.

New Hampshire           0.846
Virginia                        0.794
Iowa                             0.843
Ohio                             0.906
Colorado                      0.797
Wisconsin                    0.967
Florida                         0.503

Because the numbers are identical to the ones I chose for the chances-of-rain example, the odds of Obama winning every one of these states should have been the same as the prediction of seven consecutive days of rain, 19.9%, assuming Silver's "chance of winning" predictions work the same way as a "chance of rain" prediction.

"Good for Nate Silver," you might say, "he made a moderately risky bet and won."

What if you look all the battleground states? What are the chances that Obama would win every one of the states that he was favored in, and Romney would win all the ones he was favored in?  That is, what are the chances that the most likely thing happened for every battleground state? The answer, it turns out is 14.5%. On the flip side, there was an 85.5% chance that at least one of the battleground states should have gone to the less favored candidate. That means Silver only had a one in seven chance of getting them all right, if by 'right' we mean the candidate who the model favored, by even the slimmest of margins, actually won.

This is where things start to get suspicious. In the 2012 presidential election, in every state in the nation (assuming Florida goes for Obama as expected), the candidate that the model favored won. Not even once did either candidate beat the odds. That's why this map of nationwide results as reported by CNN . . .

looks identical to the one that Silver's model generated the day before the presidential election took place.


But if you go by the individual state predictions that Silver himself published, the odds of that happening are only about 12%.

If you consider the outcomes of the Senate races, this is what the final map looked like as presented by CNN . . .


And this is what Silver's model predicted (green in Silver's map is the same thing as fuchsia in CNN's map).

Every race went to the favored candidate except one. The odds of that happening, again using Silvers predictions for chances of winning in each case, combined with the results in the presidential race, amount to only 6%. That means there was a 94% chance that at least one or more of the races that went to the favored candidate should have come out the other way around. Silver's model itself would say the likelihood of making this good a prediction is substantially lower than Romney's chances were for winning the election.

Impossible? Not at all. Unlikely? I would say yes. And things would get even more uncanny if you include numbers Silver calculated for races in previous years, where he was just as good at predicting which way the races would go.

So what's going on? One possibility some folks have offered me is that Silver's "chance of winning" number is not analogous to the "chance of rain" in weather forecasts. What exactly it might be, they have yet to get through my thick head. I don't buy it. If it's something other than what the words imply, then he should (and most likely would) use different words. So unless I get Silver himself to tell me "chance of winning" doesn't mean what the words say, I have to assume the phrase is what it is - I can read English, but I can't read minds.

Perhaps Silver's model is too good and he's adjusted the numbers so they look a bit more modest (by writing, say, 80% where the model predicts 90%), but never switch sides (i.e. 52% in favor of an outcome never crosses the line to 48%, which would make the opposite outcome the favorite, and therefore mandatory). Of course, it's hard to imagine why anyone would do such a thing, so I tend to think this must not be what happened, despite the fact that it would be a tidy solution. More probable, in my opinion, is the possibility that the model really is better than Silver expected and that the data from this election will allow him to ensure that it reports the odds more accurately at the next election.

Another potential explanation I can imagine is that unlikely things, such as nearly-flawless strings of election predictions, happen sometimes and that Silver has simply had an extraordinary run of good luck. The downside of that is it means he will typically do much worse over the long run, and fivethirtyeight will be just one more moderately-reliable election blog in years to come.

Some of the bright people here at the American Center for Physics have suggested that I'm overlooking the possibility that individual states' elections can affect each other or may be mutually affected by some outside influence in a way to make them move together. That might be true, but it implies that Silver should sometimes call the entire race perfectly (as he seems to have done again this year) and at other times miss on almost every battleground state, but he would be unlikely to miss by a little. If this is the case, any one state's supposed "Chances of winning" seem to me to be meaningless, and Silver should only be able to predict how blocks of states move together. That would make his model a pretty blunt instrument, despite the fact that most of his fans act like it's the equivalent of a statistical scalpel. In fact, Silver points out in the description of his methodology that such interactions among states are accounted for in the model, which means they're included in the calculations that produce the model's output, including  the chances of winning he posted for each race. So this is a pretty plausible explanation, although I don't see how I could check it easily.

In any case, it at least looks to me like the odds Silver published are probably too low and should in fact have given Obama and Romney much higher probabilities of carrying the states they each won, and therefor should have implied a much higher likelihood of Obama taking a second term as president.

Why does it matter? A 90.6% probability of an overall Obama victory was a pretty clear indicator of the ultimate outcome. Would 95% have been any better? Not at all. I'm not as concerned about the values of the specific numbers as much I am in having confidence in the methodology.

Silver approached the business of analyzing the election in what appeared to be a very open, analytical and sensible manner. On his blog, reason and statistics prevailed to the chagrin of pundits and numerically-challenged political reporters. He made lots of fans among people fond of math and science, and I'm one of them. If there's some disconnect between the predictions his model makes and the numbers he reported, then Silver goes from being one of us to one of them. If it's not a scientific model that's responsible for his predictions, then they must be motivated by some personal agenda hidden behind a facade of mathiness. That of course, would be pure evil. I highly doubt that's the case, but I'm not going to get over my discomfort until I understand why Silver is so good at predicting the outcomes of elections but apparently so bad at reporting the actual odds that are supposed to provide the predictions.

In the end, I will leave you with this bit of advice: if Silver tells you there's a 50.1% chance of rain, then you better get out your umbrella. If he says there's a 49.9% chance of rain, then your picnic or wedding or parade is just about guaranteed to be dry. According to Silver's amazing model, the most likely outcome will happen every time (or at least 82 out of 83 times) even if its only minutely more likely than the alternative.

Share This!


No comments:

Post a Comment

Powered By Blogger · Designed By Top Digg Stories