Interceptions are a problem for football analysis. They look like a skill in that we see bad QBs throw a lot of them and good QBs avoid them but then they go act like luck. What I mean is that, for basically any number of attempts in a QB's history, league average is a better predictor of future interception rate than a QB's individual rate.
If you like baseball stats, interception rate is the BABIP of football. It seems like something a player could control, and there is evidence particular players can, but you basically need to wait for them to retire to be sure. (Every registered member of Lookout Landing is already typing out a comment explaining how I am subtly but unforgivably wrong on that point. I concede in advance.)
In general I avoid interceptions by assuming league average rates across the board but I did two things recently that made me want to take a look at the interception/sack relationship. I made a QB injury model which is heavily dependent on sacks, and proposed a QB rating model that is very harsh to high sack QBs.
QBX hates that Russell Wilson gets sacked a lot and I want to know how fair that is. Also, I want to know how much he should be aiming to be sacked. I can't answer that without laying some ground work and that's what this article does.
The questions I intend to answer in this analysis are:
- Does sack rate effect interception rate?
- Can sack rate be used to predict interception rate more reliably than assuming league average?
- Can we determine an ideal sack rate?
- Is it possible to write an article while tweet deck is open?
I've said multiple times on this blog that I believe that sack rate effects interception rate. Actually I think it is a pretty trivial assertion. Here is a flow chart of my idealized passing down in terms of QB events:
In this idealized world it is trivial to show that if you take more sacks you have less interceptions per drop back. But you also have less completions and incompletions so the whole thing is more academic than practically useful. In fact the model needs probabilities on the arrows. It makes intuitive sense that pressure augments the probability of an interception by forcing early, uncalled, and awkward throws. If that's the case then sacks will have an even greater effect on interception probability:
Reading hint for the next chart: the lower case p stands for "under pressure"
The above flow chart represents a purely hypothetical QB whom we'll call Danny.
It's trivial to calculate Danny's interception rate:
Danny's Interception rate .4*.85*.05+.6*.01 = .023
Throughout the rest of the article I may refer to various terms in that equation with greek letters. Here's a legend:
Foreshadowing: It is important to note that in this equation σ is not sack rate! Instead it is the rate at which pressures turn into sacks. I will not be using the two terms interchangeably and neither should you.
Now with that out of the way I should just be able to wander over to the play by play data, confirm that sack rate has a negative relationship with sacks and we can all go home, right?
Wrong. You're so stupid. Unless you already figured it out from the foreshadowing, in which case good for you. Showoff.
There are actually a few problems. Sacks are only related to some interceptions - those that occur under pressure and interceptions are already rare. The combined effect means that the relationship is going to be very weak to begin with. Second, and this is the big one that I foreshadowed, sack rate and interception rate are both dependent on the pressure rate. As the rate of pressure goes up so to do the sack interception rates. And with realistic value ranges for the variables the effect of pressure rate drowns out the relationship of interest in real data. That means that we end up getting a small positive correlation between sack rate and interception rate in naive analysis.
This is a great example of why blind correlation/regression is a bad idea. Make a model first!
But I do need to determine if my model is accurate, or at least support the claim that it is. For that, unfortunately i have to turn to proprietary stats behind the paywall of a site that i'm going to leave unnamed.
I'm not going to name the site because I think it might help start a rumor that I'm involved in some sort of crazy stat feud with a much more popular site/writer which will increase my notoriety/exposure.
Honestly though I don't like paywalls at all. I don't like the idea that readers can't check my work for free, I don't like that they're antithetical to what I consider the purpose of stats - elevating discourse, and I don't like that I have to pay.
At any rate this site (it's Pro-Football Focus) has season pressure numbers (how accurate are they? I have no idea!) so I can get real values for ψ and σ and see if they support my posited relationship.
What I can't do is just run a correlation of σ and Ι for the full population of QB seasons. That ends up being weak and positive as well. Why? Because sucking in one facet of your game means you probably suck in others. A QB with a high pressure into sack rate is more likely to have a true higher interception rate talent than one who avoids sacks.
So I'll do some fudging.
Basically I took all the QBs with >2 seasons in the dataset (31 of them) and then rescaled each of σ and Ι so that the highest value was 1, the lowest was 0, and those in between were rescaled to a number between 0 and 1.
Using those number I can run a correlation (r = -.17 incidentally) free from the tyranny of individual QB talent or, even better, test the null hypothesis that there is no negative relationship between sacks and interceptions. Doing just that my computer says:
And even better than that I can show you a watermelon colored chart allowing you to visualize the conclusion that high sack seasons are more likely to result in low interception seasons than otherwise (and visa versa!). Also it will show you that this really deserves a bigger dataset.
Now I feel confident in saying that sacks result in less interceptions. Admittedly this doesn't prove the usefulness on the model but it does at least lend it some credence.
Onto the second question then!
The model says yes but the reality is no. There are just too many moving parts for such a small relationship to be useful for predictive analysis.
That was short! But the third question:
is actually the interesting one anyways. It breaks down into 1) Is there an ideal sack rate >0? and 2) Can it be found in the real world?
The answer to the first part is an unequivocal yes. To show why that's the case I'm going to bring back Danny. Using a simplified version of QBX (QBXP) that takes into account pressure but drops non-sack fumbles I'll show you how his value varies as I change his σ and τ:
Note that somewhere between a 20% and 30% interception under pressure rate Danny starts to get better by allowing the opposing defense to slam his body into the ground. Now, if instead of visualizing this as adjusting Danny's skill level we think of it as representing individual plays we can say that Danny should take the sack when he thinks there is a greater than 25% chance of throwing an interception.
We can visualize that by taking the derivative of the linear regressions of the lines above (the slope) and making a model. The intercept of the x axis is where Danny should be ambivalent about whether to take the sack or throw and it occurs at τ = .275
I know that 27.5% seems unrealistically high. In fact it's so high that it says Danny is better off putting the ball up in the air when the defense is more likely to catch it than the offense - about 65% to 35%! That's not a real conclusion though. Because of significant limitations to the QBXP model all it demonstrates is that there is a point where sacks provide value. Which is actually kind of a trivial conclusion. Sorry.
What are those limitations? Well, for one, the yards per completion distribution is the same under pressure as it is for non-pressured plays. I suspect that QBs can't throw as far, as well under pressure but I don't have play by play with pressure notes to prove it. Second, the incompletion rate is held static as everything else changes. That's insane. And third it dropped the interception risk versus attempt yardage model.
Also there's the no scrambles thing. Feel free to discuss that.
As near as I can figure each one of those limitations makes chucking it more attractive. That means that the tradeoff point is actually going to be substantially lower. It is clear from this that QBX punishes sacks too much.
In the end the answer to subquestion #1: "Is there an ideal sack rate greater than 0" is an obvious and unequivocal yes. I suspect that the answer to #2: "Can we find it in the real world" is yes too but without play by play data of QBs under pressure to make an appropriate simulation any answer would just be a guess.
Answering subquestion 2 without guessing is for the next article in this all new, all sexy, series.
I did all this because I suspect that Russell Wilson takes too many sacks per pressure. I think it's his greatest weakness. I'm really going to try to build on this analysis to see if I'm right. But none of what I've done so far gives us a better idea than scouting and the analysis of raw stats. I do think that it gets us closer to starting the analysis though.
At least for now the interesting question, "how many sacks should QB x be taking per pressure?" is something exclusively for coaches, scouts, and players to answer based on experience and their meat computer. And in the end there will always be the QB on the field taking into account the game state, the play call, field positions, his footing, and everything else from weather to noise when he decides whether to throw the ball or get throttled to the ground by a very big man.
I know that I'd throw it (wobbling to the ground ten yards in front of me like half cooked french toast after a failed flip) so I'd like to take this moment to thank Russell Wilson for being fucking awesome at extending plays.