From the start of the season, I said I wanted to take a look at how hard it is to actually kick a field goal —not from a go out and kick one myself point of view— for professional kickers, by looking at the publicly available data from the NFL. For a stats nerd such as myself, it’s an interesting problem and one that I didn’t know at the outset that I could solve. But armed with R-studio, a couple laptops, and plenty of coffee, a grad school classmate and I set out to see if we could predict a kick.
Some of you may be familiar with MIT’s research into seeing if predicting the outcome of a kick can be done with any level of accuracy. Several years ago, this group looked to see if they could provide a solid answer as to what makes a field goal easier or harder, other than the obvious answer of distance. Does wind make it harder? If so, by how much? What about precipitation? Icing the kicker? What about score differential? Data used in the original study was from 2000-2011.
These are all legitimate questions that could yield interesting results. Does interpretability matter? If I give you a black box that tells you the likely outcome and is accurate, do you care how I reach the solution?
These are questions that I, and my partner Weronika Swiechowicz, wanted to answer. While a model was produced some years ago by MIT, we wondered if there had been any change in trends since or if we could do better. We scraped the NFL play by play database using NFLscrapR and then combined that database using weather information to give us every play, result, and weather from 2009 to 2016. Initially, we used the logistic regression to see if we could closely replicate the results. While looking into kickers individually would be interesting, we kept the model kicker agnostic. Why? Weronika and I felt the model developed would have more longevity. Players come and go, but trends stick around a bit longer.
Predicting made is easy, predicting missed is hard.
We used several classification models to try and produce usable results, then compared the models against each other to find the best one using observable factors such as distance and weather. Early on in our analysis the primary problem with the data set emerged: It is orders of magnitude easier to predict a kick that is made than one that is missed.
If you’ve been reading my Kicker Consideration series, you will recognize this histogram. NFL kickers as a group make many, many more field goal attempts than they miss. This reality is especially true from 35 yards and below where they almost never miss. The problem is so pronounced that simply predicting every kick is good leads to a fairly accurate model since the majority of kicks occur from 40 yards and shorter—of course that’s not exactly useful. Of the model types we evaluated, like the MIT team, we found that logistic regression worked best for this problem type, though the factors from our model differed substantially.
Icing the kicker isn’t significant, but what is?
First, we tested to see if icing the kicker has an effect. Again if you’ve been following my articles this year you probably already know the answer: No. You can find more on this topic from my earlier article here.
Next, we regressed field goal result on season, drive, quarter, time, distance, score differential, and weather conditions (rain, snow, fog, and wind). Our results found that distance, distance interacting with season, drive interacting with precipitation(snow), distance interacting with time, distance interacting with temperature, and precipitation(snow) interacting with score differential were all statistically significant. What does this mean? Kicking from long distance is obviously hard, but snow, temperature, score differential, and time in the game are the factors all most likely to make a kick prediction a miss.
Surprises in the model, more surprises on what was left out
What was most surprising to me was that wind strength did not come up as a significant factor, but there is a fairly straight forward answer as to why. We did not account of direction of the wind relative to the kicking direction of each attempt. In doing so, it may be possible to better quantify wind’s effect on the result of a field goal. But, accounting for direction relative to the field goal posts was one of the many tasks we wanted to include and simply ran out of time to integrate. Another surprise was that rain was not listed as a significant contributing effect. I suspect that rain is either not always accounted for accurately using our source at http://nflweather.com, or the effect of rain is more accurately capture by accounting for a wet ball which can occur if there was rain before the game and not during. If the later is the case, then even if our weather source was correct about it not raining during the game we would be missing the impact of a wet ball.
What you want to predict is just as important and how you predict it.
Using a tuning parameter, we were able to change our model to analyze different results. What the heck is a tuning parameter? Think of it as a knob we turn to get the model to tune it’s results differently, much like you would for a machine. For example, if you are more interested in where and in what conditions you are likely to miss, we can tune our model to focus more on that result. In general, we focused on predicting misses for the reason outlined above.
What does this all mean? Without knowing who the kicker is, where they are kicking, or what side of the field they are kicking from we can estimate with good accuracy the outcome of the kick. This analysis, however, is not a statement of causal inference. There are too many interactions and confounding factors remaining to declare any “if, then” relationships. Rather, this was about predicting the outcome given the available information.
An interesting verification
Speaking of prediction estimates, the NFL actually produces an estimate as to what the probability of a kick is though doesn’t publish exactly what factors are included. Again using the public database we wanted to see just how closely we could model this prediction percentage. Using mostly the same factors as listed above, we were able to capture the NFL’s published percentages at an accuracy of 92%. While this is certainly less useful that predicting if a kick will be good or not, it was an interesting exercise in validation.
For such a model there is no need to ask the question "Is the model true?" If "truth" is to be the "whole truth" the answer must be "No". The only question of interest is "Is the model illuminating and useful?"
While no model, as far as I am aware, can predict a field goal with 100% accuracy, this was a fun exercise to toy with as the last academic quarter rolled on. If you know of or develop such a model, I would encourage you to go running to the nearest NFL team and demand a handsome reward. And while I would’ve liked to spend more time on it, this project ended up being a very small part of our overall grade. An annoyingly small part in fact, but that’s hardly a topic for here and now. Nevertheless, I wanted to share a bit of the lessons learned and information found as we swam through 825 lines of R code and thousands of kick attempts in the NFL database.
A real world example and comparison of results
Too long; didn’t read? Kicking from distance is hard. Kicking in the snow is hard especially when it’s late in the game and you’re kicking from a negative score differential (just ask Navy in reference to this last Army-Navy game, Go Army). Field goals are also more likely to be successful these days than ever before, confirming that kickers are improving over time as Benjamin Morris wrote at FiveThirtyEight in 2016.
Alright, that was a lot of math stuff so let’s do some examples and get to the real so-what of all this.
It’s 46 degrees out, the kick is 52 yards, no precipitation, seven seconds remaining in the 4th quarter with the kicking team down by three. What does the model predict? The model, which returns only “Good” or “Missed” returns Good here.
Jon Gruden: “I personally would let Walsh kick it.”
Me too coach, in fact the NFL’s own probabilities agree saying the kick was roughly a 65% probability of success, so our model is in line with what the NFL says; the result of the kick was mostly likely to be good. But, unfortunately for Seattle Seahawks fans, the result was not what either the model nor the NFL predicted. Such is life in the world of randomness and prediction, sometimes things just don’t go your way.
I hope you enjoyed reading about this little side project of ours. I know it wasn’t perfect, but it was an enjoyable little excursion into the play by play database. I will probably take up this project again this upcoming summer after graduation to see what else I can find. Though I will probably have to carry on without my partner, I’m quite sure she’s tired of NFL kicking data by this point.