Assessing the Accuracy of WP

Between John and I, there have been a couple of posts at FG related to the Advanced NFL Stats Win Probability charts. If you've paid any attention, you know just how neat they are. If you are a generally curious person, you probably have wondered about the accuracy of the WP charts. Wonder no more. If you have no interest in probabilistic models, you should probably just stare at a wall for a few minutes.

For readers who are accustomed to linear regression models, you'd expect to see a goodness-of-fit statistic known as r-squared. And for those familiar with logistic models, you'd expect to see some other measure, such as the percent of cases predicted correctly. But the win probability model I've built is a complex custom-built model, using multiple smoothing and estimation methods. There isn't a handy goodness-of-fit statistic to cite.

We can still test how accurate the model is by measuring the proportion of observations that correctly favor the ultimate winner. For example, if model says the home team has a 0.80 WP, and they go on to win, then the model would be "correct."

But it's not that simple. I don't want the model to be correct 100% of the time when it says a team has a 0.80 WP. I want it to be wrong sometimes. Specifically, in this case I'd want it to be wrong 20% of the time. If so, that's a good feature of any probability model. This is what's known as model calibration.

Right, so that's it for the wall of text. If you are still reading, now go check out the charts in the article. The first shows a very nice relationship between actual results and the predicted results. There's a problem, however: The top chart shows the relationship between the 2000-2007 data and the model, which was built off of the 2000-2007 data. When building and subsequently testing a model, it's important to split the data into what's known as test and training data sets, part of the cross-validation process. If you test a model with the same data you created it with, the results will almost certainly show a very good model*. To ensure that this wasn't the case, Burke tested his model against the 2008 data, and the results look pretty good for a single-season sample.

Also, Burke charted out the model confidence, which is pretty interesting to look at from a fan perspective. It makes sense that the average game should start out with a roughly 50/50 split at kickoff. As late as the ten minutes from the end,  however, there is generally only about ~80% confidence in a winner. That leaves a lot up for grabs.


*Unless you screwed up.

X
Log In Sign Up

forgot?
Log In Sign Up

Please choose a new SB Nation username and password

As part of the new SB Nation launch, prior users will need to choose a permanent username, along with a new password.

Your username will be used to login to SB Nation going forward.

I already have a Vox Media account!

Verify Vox Media account

Please login to your Vox Media account. This account will be linked to your previously existing Eater account.

Please choose a new SB Nation username and password

As part of the new SB Nation launch, prior MT authors will need to choose a new username and password.

Your username will be used to login to SB Nation going forward.

Forgot password?

We'll email you a reset link.

If you signed up using a 3rd party account like Facebook or Twitter, please login with it instead.

Forgot password?

Try another email?

Almost done,

By becoming a registered user, you are also agreeing to our Terms and confirming that you have read our Privacy Policy.

Join Field Gulls

You must be a member of Field Gulls to participate.

We have our own Community Guidelines at Field Gulls. You should read them.

Join Field Gulls

You must be a member of Field Gulls to participate.

We have our own Community Guidelines at Field Gulls. You should read them.

Spinner.vc97ec6e

Authenticating

Great!

Choose an available username to complete sign up.

In order to provide our users with a better overall experience, we ask for more information from Facebook when using it to login so that we can learn more about our audience and provide you with the best possible experience. We do not store specific user data and the sharing of it is not required to login with Facebook.

tracking_pixel_9341_tracker