/cdn.vox-cdn.com/uploads/chorus_image/image/17784083/175810383.0.jpg)
I'm on vacation.
It's fantastic.
At least it would be if sports radio were any good in the Lake Tahoe area. I have listened to a grand total of five minutes of radio in the past week and I have learned that 1) I probably shouldn't mix wood chips into my garden soil, 2) It's okay to leave loosely soiled bulb flowers out for a day, and 3) I should consider QBR when discussing QB quality.
Here's the thing though, you really shouldn't. The wood chips will encourage cellulose eating microbes which are very nitrogen hungry and doing anything that leaves you relying on nitrogen rich fertilizers, especially if you're in area like the Puget Sound, is just plain irresponsible.
Also, QBR is just really bad. Like worse than using fertilizer in place of home made loam compost bad.
The problems with QBR and passer rating are well documented. My antipathy is well documented as well. But I'll just go ahead and throw a few non-mathematical reasons out there:
- Obscurity: Although the passer rating formula isn't proprietary like QBR, both "stats" (like those scare quotes? They let you know what the author thinks of those so called "stats") have an obscure enough basis that it is difficult to intuitively understand where they might useful (nowhere) and where they're terrible (everywhere).
- Unit-less: This is related to obscurity but advanced stats that lack units are obnoxious because individual stats are impossible to understand without being presented league wide context every time. Sure, I know that a passer rating over 114 is going to be good but I'd rather be informed of that by my understanding of the game of football than continued exposure to basis-less numbers. I don't care if the units are arrived at artificially, that's still better than rescaling from 0 to 100 and saying "it's for ranking!"
- Gumbo Approach: Football is very complicated. There are two ways to react to that as a stat lover: 1) Get all divine machine on the thing and try to find each gear, how many teeth it has, and how fast it's spinning so that you can reconstruct it. 2) find the simple things in the chaos by stripping away as much motion as you can while it still looks like football and see if you can at least describe the uncertainty. I call the first the gumbo approach and I don't like it. The second I call the sushi approach and I like it.
A stat can fall afoul of any of these and still be good. It can fall afoul of all three and I'll still like it if it does a good enough job. A lot of baseball stats fall in that category. Football is not baseball. For me, in football, a stat should be both quantitatively accurate and qualitatively usable QBR and passer rating fail on both fronts.
So you'll understand then why, before I can have a good second and final week of my vacation, I have to invent a new stat for QB comparison.
The first step is to get rid of rotten ingredients. And by that I mean anything that is far enough out of the QB's control that it introduces more noise than signal. So interception rate is out, touchdowns are out, and, god help me, clutch is really fucking out.
Note that that will tell you what I think of TD/Int ratio. What the hell is supposed to even represent? It's not like there is a red-zone lightning round where a QB plays with one receiver and against 11 DBs. Though, if the Pro Bowl committee is reading this...
The next step is to see what good ingredients are left. I have sack rate, completion rate, and yards per attempt. Good analysts have used these three as their go to for a long time and there is nothing to be added to the conversation by amalgamating them into a franken-stat. NY/A already exists.
But the other day I logged into twitter and I happened to see some stupid person with internet access yelling at Arif Hassan for using Y/A (or maybe NY/A) to point out that Ponder probably isn't good. The tweeter seemed like an idiot but the point he accidentally raised while being a homer was valid - Y/A has flaws. The foremost of which is that it describes a mean and what we really want is a whole distribution. The problem with distributions is that they don't mean much to look at. Here are the distributions of completion yardages (total, not air) for each QB with over 400 plays (not attempts) last year:
Looking at that doesn't tell you very much. Y/A is still more valuable.
But here's what I can do to fix that: play eXtreme Aaron Ball.
In eXtreme Aaron ball you put together the greatest offense ever seen around your QB of interest and then you call in a dream team of defensive players past and present - all miraculously in their prime. The two teams assemble at some hallowed stadium, let's just say Lambeau. The teams warm up and the crowd's anticipation pulses to crescendo. Then, just as everyone begins to wonder if the game will really happen - if it isn't just a dream - I walk out onto the field with a fold up table, set it up, put down my clipboard, and hand both captains a 10^16 sided die.
Obviously at this point most of those in attendance faint, some may even expire, but that's of little matter to the hardier fans who are chanting my name at once reverentially and feverishly.
Then we roll out the first play and most of Green Bay is destroyed in the ensuing riots.
Basically the QB rolls to see if he gets a completion, sack, or incomplete with the likelihood of each play outcome determined by his stats. If he rolls incomplete the play is done. If he rolls sack he does a fumble check based on the league average fumble rate. If he fumbles the defense checks for recovery - again league average.
In the event of a completion he rolls to see the yardage, weighted by his completion yardage distribution, at which point the defense does an interception check weighted against yardage (interceptions occur more frequently farther from the LOS) . If the throw survives the interception check the QB rolls to see if the WR fumbles and the defense checks for recovery. If the QB makes it 100 yards before punting he gets 7 points, 60-99 he gets 3, and anything less he gets yardage times .05. The down rules are like football except you can't go for it on fourth.
After several experiments, and as many destroyed stadium neighborhoods I decided I'd just simulate eXtreme Aaron Ball on my computer.
As I simulate eXtreme Aaron Ball a QB will eventually approach a stable mean points per drive - this is his QBX.
Unfortunately, because I don't use simple distributions there is no closed form solution to arrive at QBX, instead I actually have to simulate about 20,000 drives (actually I do 40 seasons of 500 drives) to arrive at a reasonable confidence range for each QB's QBX. In this article I show the ranges - in the future I will likely just present the mean.
Why should you like QBX?
- It stabilizes quickly, here's the split-half correlation showing QBX has a .5 correlation at just 225-50 plays (225 random plays from a QB have an r-value of .5 to another 225 random plays from the same QB):
- It gives you the expected QB distribution - lots of average, a few great, and a couple just really bad (this chart also gives you an idea of how accurate the stat is at 20,000 drives)
- It gives you something that other stats don't - the interplay of attempts and risk.
- You want to be cool and liking things with X in the name is cool.
Why shouldn't you like QBX?
- It's mostly unit-less. QBX points don't have a real bearing on real football points.
- It seems possible that different QBs have different fumble skills so the league average assumption may make some guys looks worse. I doubt that this is the case though. Either way the sample size of sack fumbles is so low per QB that it would feel dirty to make individual adjustments.
- You're a Green Bay fan who refuses to believe that Aaron Rodgers takes too many sacks.
- There is a lot of evidence that much of sack rate is QB controlled, it is not completely QB controlled. Playing in San Diego or Arizona will make you look worse.
- It does not reward rushing QBs. In fact it unfairly penalizes them for sacks without crediting them for rush yards. That means it isn't going to be kind to Russell Wilson but neither are other stats and that's a stupid reason to not like a stat anyways.
- You don't want to be cool.
Without further ado here are the QBX tables for 2012 (over 400 plays [not attempts]) and 2009-2012 (same cutoff):
Give Ponder a chance!
Note that the values are a bit different for 2012 only QBs in the two charts - that's because the interception model is different in 2012 than in 2009-2012. I haven't completely decided how to deal with the interception model over time since it is as yet unclear to me how much is noise and how much is real change in the league. For now I'm using the data from just the time period of the QBX and the difference appears to be minimal at any rate. We'll call it the Sherman bump.
Also h.Hill is Shaun Hill. The play-by-play lists him as Sh.Hill and my code pulls out just the first letter. I refuse to do anything special for a QB who played in Europe.
League average QBX is about 1.27 points. That makes Wilson marginally better than average and that may upset you. But like I said, it is purely an air stat and, at any rate, Russell Wilson improved his QBX dramatically from the beginning of the season. Here is his moving QBX for each set of 200 plays over the season:
He went from John Skelton to Aaron Rodgers in a single season. I think QBX may end up liking him quite a bit.
Well I feel better now! I'll try to be available in the comments but internet access is rare.