Field Gulls: An SB Nation Community

Navigation: Jump to content areas:


Sports blogs for fans, by fans.
Around SBN: Cal RB Jahvid Best Seriously Injured, Carted Off Field

Assessing the Accuracy of WP

Between John and I, there have been a couple of posts at FG related to the Advanced NFL Stats Win Probability charts. If you've paid any attention, you know just how neat they are. If you are a generally curious person, you probably have wondered about the accuracy of the WP charts. Wonder no more. If you have no interest in probabilistic models, you should probably just stare at a wall for a few minutes.

For readers who are accustomed to linear regression models, you'd expect to see a goodness-of-fit statistic known as r-squared. And for those familiar with logistic models, you'd expect to see some other measure, such as the percent of cases predicted correctly. But the win probability model I've built is a complex custom-built model, using multiple smoothing and estimation methods. There isn't a handy goodness-of-fit statistic to cite.

We can still test how accurate the model is by measuring the proportion of observations that correctly favor the ultimate winner. For example, if model says the home team has a 0.80 WP, and they go on to win, then the model would be "correct."

But it's not that simple. I don't want the model to be correct 100% of the time when it says a team has a 0.80 WP. I want it to be wrong sometimes. Specifically, in this case I'd want it to be wrong 20% of the time. If so, that's a good feature of any probability model. This is what's known as model calibration.

Right, so that's it for the wall of text. If you are still reading, now go check out the charts in the article. The first shows a very nice relationship between actual results and the predicted results. There's a problem, however: The top chart shows the relationship between the 2000-2007 data and the model, which was built off of the 2000-2007 data. When building and subsequently testing a model, it's important to split the data into what's known as test and training data sets, part of the cross-validation process. If you test a model with the same data you created it with, the results will almost certainly show a very good model*. To ensure that this wasn't the case, Burke tested his model against the 2008 data, and the results look pretty good for a single-season sample.

Also, Burke charted out the model confidence, which is pretty interesting to look at from a fan perspective. It makes sense that the average game should start out with a roughly 50/50 split at kickoff. As late as the ten minutes from the end,  however, there is generally only about ~80% confidence in a winner. That leaves a lot up for grabs.


*Unless you screwed up.

0 recs  |  Comment 7 comments

Story-email Email Printer Print

Comments

Display:

I'm not a statistician

But I know a little bit about developing a testable model. This defense of WP sounds like a whole lot hand-waving and “just trust me guys”. Especially this:

But the win probability model I’ve built is a complex custom-built model, using multiple smoothing and estimation methods. There isn’t a handy goodness-of-fit statistic to cite.

Using the right “multiple smoothing and estimation methods” you can make make any piece of experimental data say anything you want it to. The fact that the developer of WP tailored his results to fit the original experimental data and that “There isn’t a handy goodness-of-fit statistic to cite” seems really fishy.

If you test a model with the same data you created it with, the results will almost certainly show a very good model*.

Totally true. While it is great that the results agree with the 2008 season, I think some more testing is necessary before WP can be considered more than a novelty act.

by ninjasocks on Jul 8, 2009 10:12 AM PDT reply actions   0 recs

You can't expect to develop a model for WP with any single regression.

Using the right "multiple smoothing and estimation methods" you can make make any piece of experimental data say anything you want it to.

You’re off base here. Accuse him of not properly validating the model, fine, but this isn’t a case of lying with statistics. The point of a predictive model is not to “make any piece of experimental data say anything you want it to”, but rather to use that wealth of data to learn about trends and distributions within the data for use in, well, predicting. If Burke needed to use several estimation methods to fit various aspects of game modeling, then his model should be more accurate on account of him having done so. If you read into his comments, he mentioned problems modeling “going for it on 4th down” situations. This is a detailed model. He didn’t just run a linear regression on point differential and time remaining.

by abender20 on Jul 8, 2009 10:36 AM PDT up reply actions   0 recs

I read it more as

there are only so many situations when it’s 3rd and 23rd with 1:34 to go on your own 35 yard line and down by 4, so this WP is not as simple as something like baseball, where there are only so many scenarios that are possible because there needs to be a bit of estimation involved.
Also, that there is a reasonable degree of estimation in his calculations, but it might take 50+ pages to explain what he did, why he did, and how he went about using what he did, so just trust him when he says he tested it on its predictive value on games played in the past and that it seemed to ‘predict’ the results accurately.

by LantermanC on Jul 8, 2009 10:51 AM PDT up reply actions   0 recs

The model performance looks good to me

The first calibration graph (based on 2000-2007 data) is basically useless, but the model looks to be performing pretty well on 2008 data.

The “Confidence” plot is a little hard to interpret, but I think it’s answering the question: “If you bet on the team with the higher WP at time X, what proportion of the time will you win your bet?” Unfortunately, this mixes two kinds of variability: variability due to errors in calculating the win probability via the model, and variability due to the nature of the game (i.e. last-minute lead changes do happen). This second kind of variability cannot be reduced by building a better model; even if you estimated win probability without any error, at every time before the end of the game there is some chance that betting on the team with the higher WP will ultimately be a losing bet.

In the end, I interpret WP at any given state of the game as the proportion of times teams in similar situations have gone on to win the game. And the model seems to be doing a good job at estimating this proportion.

by cyberwulf on Jul 9, 2009 9:45 AM PDT reply actions   0 recs

Comments For This Post Are Closed


User Tools

Thanks, Walter.
Start posting about the Seahawks »

Join SB Nation and dive into communities focused on all your favorite teams.

FanPosts

Community blog posts and discussion.

Recent FanPosts

Small
Post Your Hawk: Week 9
6a00d8341c873353ef00e5528e99be8833-800wi_small
Wild speculation
Rainbow_small
Video Preview - Detroit Lions vs Seattle Seahawks
Small
Chris Spencer is the Betancourt of the Seahawks
Small
SBN Layout Upgrade
Dscn0146_small
The necessity of shutting Detroit out on Sunday
Jj_flag_detail1_small
Seahawks Fall Hard - Bitch Thread
Small
In defense of Tim Ruskell
Front_of_car_small
What's Bugging Me
Small
Post Your Hawk: Week 8

+ New FanPost All FanPosts >

Latest NFL Headlines from SB Nation


Managers

Image_114_small Shrug

Jj_flag_detail1_small John Morgan

Editors

Rainbow_small Scruffy Lefty

Authors

Vp081-c_small Christian

Small BrianL

Small abender20

Small Doug Farrar

Dksbtwit_small Johnny Peel (DKSB)