A new football season is upon us. Happy parents are sending unhappy children to school. The autumn sun glows with promise. Raging optimism clashes with unexpected ankle injuries. And pundits everywhere are declaiming tiers of "elite" quarterbacks, united by their passion for the game and an awe-inspiring talent matched by few amateurs, namely, the ability to type.
As it happens, I've figured out a way to determine exactly who the best NFL quarterbacks are. The method is objective and accurate. Indeed, my biggest concern is the risk of putting so many hard-typing sportswriters out of work (by which I mean, I'm afraid it will do nothing of the sort). But first, I must warn you about how difficult it will be to understand the whole thing:
Not difficult at all.
Seriously, a twelve-year-old could follow it. And I don't just mean the kid with the thick glasses and a slide rule sitting at the front of the class, no, I mean the kid sitting in the back who's going to get a 'D' on the test because he's drawing breasts and penises instead of taking notes during the lecture (yeah, I see you). Honestly, most of the work was done by other people a long time ago, and all I've done is add a few things together.
Did I say "add"? Yes, there's a little bit of math. Because...
Statistics aren't everything. They are the only thing.
I understand that sports are kind of a geeky pastime. Some of you are probably more into cool stuff that involves subjective judgment like, I don't know, rock music, dog shows, or hairstyling. So I'll explain how football works:
Every game is overseen by a group of people call "referees" who are responsible for determining the winner. And they are nerdy, math-loving tyrants. They keep track of each team's accomplishments on the field, things like a "touchdown" or "field goal", and then assign a fixed number of points for each accomplishment. At the end of the game, the team with the most points is declared the winner. At the end of the season, the team that won all of its playoff games is declared the champion.
And that's it. There are no points awarded for style, good looks, or creativity. There is no appeals process. Kinda stifling, right? If you think that's a stupid way to play a game, then you should just close this tab now and go watch some funny cat videos. You won't like football.
Still with me? Well, it kinda gets worse. To score a touchdown, the team with "possession" of the football has to advance the line of scrimmage all the way to the goal line, with those referee/tyrants breathing down their necks every step of the way and measuring how much distance (in yards) each play gains or loses. And just to irritate us even more, the tryants have a schedule of "downs", and should a team fail to advance the ball at least 10 yards over the course of four "downs", the referees will declare a change of possession. So teams have to get as many yards as they can on every play.
You may have heard a tidbit here and there about individual players being "good" because they have strength, speed, coordination, intelligence, etc. But the sad truth is that these players and teams are all toiling under the iron fist of math. Having a guard who can bench press 400 pounds or a receiver who can run 40 yards in 4.3 seconds gives you, by itself, zero points. Such qualities are only valued as a means to an end, namely, helping the team gain yards towards the goal line. I don't blame you if you hate this, but it's been that way for more than 80 years and is not likely to change anytime soon.
One last quirk: Ever watch a baseball game? Each team gets nine innings to score points, and each inning can go on indefinitely while a team keeps scoring. Football ain't like that. The number of "innings" (possessions) is simply however many you get through before the game clock runs out. It could be seven one game and fourteen the next. And when you score in football, the only reward is the points-- the tyrants seem to think it's more fair if the losers who just gave up a score get rewarded with possession afterwards. The upshot of all this is that scoring a lot of points and gaining a lot of yards over the course of a game doesn't always translate to a good performance. Obvious, right? An eight-year-old could understand that. In order to win, you have to score as much as possible with each drive.
Individual (Quarterback) Statistics
Figuring out which teams have a good offense is going to be pretty easy. In fact, given the supreme importance of quarterback play, you could probably just look at team points per drive, put all the quarterbacks in that order, and call it a day.
But I can offer more-- thanks to a lot of hard work and original research done by people other than me. Several stats have been devised to measure quarterback performance.
The NFL's official passer rating was devised in 1973 as an omnibus measure of completion percentage, yardage, touchdowns, and interceptions. It was a little haphazard and inaccurate at the time, but thanks to the changing nature of the game it's gotten
better much worse.
A quarterback who completes 4 of 6 passes for a spectacular 200 yards will have a rating of 109.7, which isn't that spectacular on a scale that goes to 158.3. A quarterback who goes 6/6 with 300 yards will still only make it to 118.75 because he hasn't met his touchdown quota. In some cases, quarterbacks can improve their rating by throwing complete passes for negative yards, and in all cases quarterbacks improve their rating by getting sacked instead of throwing the ball away.
What I'm saying is, it's got some quirks.
ESPN's Total Quarterback Rating (QBR)
This is a proprietary formula. Is keeping secrets a good idea?
It can be. If you're manufacturing, say, nuclear bombs, you don't want everyone knowing how it's done. But if you are literally giving away nuclear bombs to every schmuck who wants one (ESPN freely distributes its QBR ratings), it's kinda pointless. On the other hand, if you are making sausages with a lot of rat shit in them, the fact that you are freely distributing said sausages absolutely gives you a reason to keep the formula secret.
So we can only speculate as to the exact nature of the rodent droppings contained in QBR. For example, QBR purports to separate quarterback and receiver contributions by separating air yards and yards-after-the-catch, and also by determining which passes are on target (near the receiver's hands) but nonetheless dropped. We can guess that this distinction would overrate quarterbacks who heave it up to the best jump-ball receiving targets (e.g., Calvin Johnson) and underrate quarterbacks with precise ball placement, timing, and smaller receivers (e.g., Brady/Edelmen/Amendola).
Speculation aside, QBR has an obvious, fundamental flaw: Its entire purpose is to isolate the quarterback's contribution, yet if fails to isolate it from anything. There is no baseline measurement for the total value of the play. Metaphorically, this is like trying to isolate the fuel efficiency contribution of an engine from an automobile's aerodynamic drag by looking closely at the spark plugs, yet not bothering to see if those spark plugs are actually attached to a moving vehicle.
Expected Points Models
Okay, enough ranting and raving. Call your eight-year-old back into the room, because it's going to get really simple.
We already know that the math tyrants reward you for gaining yards, so yards/play is pretty much an ideal way to measure how good plays are. All we need to do is figure out how to include touchdowns and turnovers.
So, let's say your team has a first-and-goal at the opponent's 1-yard-line. A touchdown isn't guaranteed, but you'll get one most of the time (and can settle for a field goal if not). So on average you expect to get about 6 points in that situation, and an "expected points" model measures first-and-goal from the one as being worth six "expected points". I mean, geez, a chimpanzee could understand that, right?
Thanks to lots of historical research, then, we know that a non-scoring gain of 20 yards means you can expect about 1.8 more points than you could before the play. And the average touchdown, separated from the yardage gained, likewise increases your expected points by about 1.8.
Voila! A touchdown is worth the same as 20 yards. The same model shows that an interception is worth -45 yards. You and the chimp have just navigated the hardest math you'll have to do all day.
The expected points model is used to calculate Adjusted Net Yards per Attempt (ANY/A), which is used by Pro Football Reference and many others as a one-off measure of passing efficiency. EP modeling is also used by Football Outsiders in their DVOA calculations, which are chock full of complicated details but based on the same simple concept.
How Good Are These Models, And What Are They Missing?
We already know that the main goal of an offense is to get as many points as possible on each drive. Quarterback play is an important factor in that , so let's see how well some of these stats correlate with the end goal:
Team Quarterback Performance and Overall Offense Correlations, 2014-2015:
|STAT|| Pts/Drive |
| Adjusted Pts/Drive
|Total Passing Yards||0.566||0.58|
It's clearly better to have more passing yards than less, but on the whole it's a pretty sloppy guess for identifying the best offenses.
ESPN's QBR isn't totally worthless, but considering that it doesn't even correlate with end results as well as the decidedly sketchy passer rating, it is relatively worthless. And considering that QBR is supposed to count sacks and quarterback runs (which passer rating ignores), it's relatively embarrassing.
Adjusted Net Yards per Attempt aligns very well with end results, and is bested only by Offensive DVOA (which has the advantage of measuring more plays). The strength of these last two correlations also gives us a very convincing confirmation of the underlying "expected points" model used to devise them.
So, are we done? Can we just look at ANY/A?
Hmm, not quite.
Y'see, every now and then a quarterback will pretend like he's dropping back to pass the ball, or maybe he even intends to pass the ball, but then he takes off and runs down the field instead. A lot of people think this is cheating. But I'll remind you again that the outcome of a football game is subject to the judgment of those obstinate referees, and they've been letting them get away with it for 80 years. You may as well complain that teams are cheating by making plans in the huddle before the play. That's how the game is played, and a quarterback's rushing yards are going to have the exact same effect on the team's ability to score as his passing yards.
This probably wouldn't be a big deal if only a couple of quarterbacks were doing it, but it's become an epidemic. Cam Newton, Russell Wilson, Aaron Rodgers, Alex Smith, Andrew Luck, Jay Cutler, Matthew Stafford, and Matt Ryan (to name a few) run a lot. Guys like Tom Brady even have another sneaky trick where they don't even drop back, but simply take the snap and plunge forward. So we better figure this out.
Football Outsiders measures individual quarterback DVOA for both passing and running. However, despite the name similarity, these scales are completely independent from each other and from team DVOA, so it is impossible to combine them in a meaningful way.
But why make it hard? We can simply look up a quarterback's rushing yards and add them in with his passing yards, along with the requisite +20 yard bonus for touchdowns and -45 yard penalty for a turnover (lost fumble). That's so easy an ant colony could do it.
The end result we'll call TANY/A, or "Total Adjusted Net Yards per Attempt". This term, incidentally, was coined by a Field Gulls contributor whose identity I have forgotten, but his/her name will appear in these brackets [pqlqi] as soon as a reliable source reminds me. The formula, which your ants have probably already written out with their little bodies, looks like this:
TANY/A = (Passing Yards + Rushing Yards - Sack Yards + 20 X Touchdowns - 45 X Turnovers ) / (Pass Attempts + Rush Attempts + Sacks)
A Few Refinements
I said it was simple, not short. Sorry. If you want to take a break and have a cookie (drop some crumbs for the ants, please), go ahead. The article will still be here when you get back.
A Few Refinements on a Full Stomach
Workload and Run/Pass Balance
A common complaint with any stat that ends in "per play" is that it somehow fails to account for a quarterback's "total production" and/or his ability to "shoulder the burden" of carrying the team.
At first blush, this sounds stupid. The more effective a quarterback is, the fewer plays it will take to get his team down the field for a touchdown. And there is no advantage to having more offensive drives in a game-- quite the opposite, in fact, because it exhausts your defense on the other side of the ball. 
But by dumb luck, there is an element of truth to this. If an offense hands the ball off to running backs a lot, it will force the opposition defense to adjust and make it easier to pass. If the offense is constantly passing the ball, the predictability will make their job harder.
To adjust for this, I once-upon-a-time invented a modified TANY/A stat which I called "Quarterback Quality at Quantity", or QQQ.  This may be a little beyond the ants, but I bet a Portia spider could follow it just fine. 
First, we measure a quarterback's TANY (total adjusted net yards) per each team play. This is then normalized to the same average as TANY/A, using a league-average quarterback play percentage of 61% (that will make sense when you read the chart below). Finally, QQQ is given as the geometric mean of TANY/A and TANY/team play. Tricksy? This will help:
|Player||Attempts||Completions||Yards|| Touch |
|Handoffs||Team Plays||TANY/A||QB Play%|| TANY/ |
| Normalized |
First, note that The Emissary is responsible for 61% of his team's offensive plays, so his "normalized TANY/team play" is exactly equal to his TANY/A. Because he is responsible for a typical percentage of plays, his QQQ is also identical to his TANY/A.
Second, we see that James Tiberius has exactly the same TANY/team play (and exactly the same total production) as The Emissary, but his team had to sacrifice 10 handoffs to get that, so he cannot possibly be considered as effective as The Emissary. In other words, TANY/team play would not be a very good final stat. Nonetheless, his QQQ is better than his own raw efficiency (TANY/A) to account for the extra difficulty he faced against an opponent expecting more passes.
Third, Jean-Luc has almost exactly the same efficiency stats (TANY/A) as The Emissary. But because his team handed off the ball so often, we can guess that Jean-Luc had an easier time of it thanks to defensive adjustments, and so should not be rated as highly as The Emissary.
More recently, however, I decided I didn't trust my past self's arbitrary choice of the geometric mean and decided to run a little test ("Damn me!" "That's what you get for being sloppy!"). This required a lot of manual work, so I used just two quarterbacks, the run-first poster boy Russell Wilson and the pass-first poster boy Andrew Luck (apologies if this makes you keen to see an actual poster, that was just a figure of speech). I calculated game-by-game TANY/A and QQQ scores for each quarterback to see exactly how much their efficiency changed based on passing frequency.
I was completely not surprised to see that the basic premise held up. Both quarterbacks enjoyed higher passing (and QB running) efficiency numbers when they handed the ball off more often.
I was a little less not surprised that Wilson's actual drop off was only about one third of that given by the old QQQ model. Well, that makes sense. Even the Seahawks consistently call more passing plays than running plays (if you think the stats say otherwise, you aren't counting scrambles and sacks as passing plays). With a legitimate quarterback under center and receivers split wide, there is a limit to how much the opposition will commit to stopping the run.
On the other side of Nebraska, Andrew Luck actually had a steeper drop off than shown by the QQQ model. However, we can logically infer (and anecdotally confirm) that poor quarterback play early in the game (e.g., multiple interceptions in the first half) forces more passing attempts later in an effort to even the score. When that happens, it is poor quarterback play that causes an increase in passing frequency, not the other way around, so we don't need to over-adjust.
In the end, I decided that for quarterbacks executing less than 61% of team plays, the more accurate QQQ score should weight TANY/A twice and adjusted TANY/team play just once (still using a geometric mean), whereas for quarterbacks executing more than 61% of team plays the adjustment would be steeper, continuing to use equal weight.
Quality of Opposition Defense
All other things being equal, a quarterback is going to have worse statistics when playing against a better opponent. Obvious, right? A strain of Bifidobacterium animalis could have told you that.
You might think an adjustment here is not necessary when looking at multiple quarterback years, because things would average out. Hey, I might even think that. And we'd be wrong together. Because teams play the same six divisional opponents every year, along with two games based on last year's finish in the standings, there are significant differences even over a long time span.
There are lots of ways to measure the quality of an opposition defense. Some are inaccurate, which is a minor problem, and others require a lot of work, which is deal-breaker. The best way to get an accurate number for each team over the span of multiple years, then, is to let someone else do the work. Thank you, Football Outsiders: Their offensive efficiency stats include the average defensive DVOA of all opponents for the season, and tabulating multiple seasons is a breeze.
Turning DVOA percentages into a yards-per-play adjustment requires just a little more effort. I used 2015 to establish a baseline, and calculated every team's defensive yards per play based on the same EP Model/Yardage Equivalent formula that we've been using for ANY/A and TANY/A: Each TD surrendered counted as +20 yards given up, each turnover as -45 yards given up, and from that a season-long adjusted yards/play could be calculated.
The team-by-team defensive adjusted yards per play (measured as plus/minus league average) was then simply plotted against defensive DVOA, and the plot looks very pretty:
The correlation coefficient is an unambiguous 0.951, which (again) totally confirms the validity of DVOA, ANY/A, and EP modeling . And the regression line equation tells us that 1% of DVOA is consistently equivalent to .05868 yards/play, and that's a number we can slap right onto TANY/A or QQQ. It's so easy an ESPN columnist could understand it (relax guys, it's just a little hyperbole).
Speaking of Andrew Luck and Russell Wilson (and some guy in Miami, I think), I've decided to use 2012-2015 as the time frame for comparing quarterbacks. First, let's look at offensive DVOA. Although this incorporates running back carries, it is much more detailed and refined than my QQQ model, and quarterback play (rushing and passing) is the biggest determining factor in offensive production. 
Average Offensive DVOA, 2012-2015
|Oak '12 + Ari '13-'15||-1.4%|
We already know that Peyton Manning is no longer one of the league's elite quarterbacks, so I decided to throw the old guy a bone and show his team's performance excluding the 2015 season. Note that by comparison this underrates Aaron Rodgers (and possibly others) because Rodgers' subpar 2015 season is included, as are the 7 games he missed in 2013.
Dallas and Indianapolis likewise have a number of games included without their star quarterbacks.
Oakland 2012 + Arizona 2013-2015 refers, of course, to Carson Palmer. Palmer put up real MVP numbers in 2015, but his stats over the previous three years were surprisingly mediocre, even when he played a full season.
And now the chart you've been waiting for...
Elite Quarterbacks, QQQ with Various Adjustments 2012-2015 ([Caveats and Details])
|Player|| Pass |
| Rush |
|ANY/A||TANY/A|| Opp Def |
| Opp Adjustment |
| Opp Adjusted |
|Tenure||Play%|| Base |
| Opp Adj
|Wilson excl R1-8||1526||321||7.18||7.29||-2.86%||0.168||7.46||98.0%||58.8%||7.20||7.370|
Those numbers at the top look pretty close, but that's a matter of perspective. One way to think of it is that a two-tenths of a yard over the course of 2000+ attempts translates to a difference of 400 yards. Another way to look at it is that all quarterbacks have their ups and downs; so a difference of two-tenths of a yard could mean the difference between six sub-par games with a mere 175 yards passing versus six sub-par games with 242 yards passing. There are several wins to be found in those small margins.
Aaron Rodgers surprises no one at the top of the list, and if you look at his career numbers he could easily be a notch higher. His 2015 outing was not one bad year in four, it was the single worst year out of eight.
Peyton Manning's final QQQ slips behind Drew Brees after limping through 2015, but if you're interested to know, Manning would still lag behind Rodgers even if 2015 were excluded. Manning actually passed the ball a hair less than the league average. On the other hand, you could fairly give him credit for a lot of his team's rushing production because of his on-field play calls. That's why we use DVOA, too.
Drew Brees can't run down the field, he's barely six feet tall, he's asked to pass the ball a lot, and he played 2015 opposite the worst defense... ever. Let's give the guy his due.
Tom Brady's QQQ puts him right near the top, to be sure, but it almost seems too low considering where the Patriots' offense ranks in DVOA. It's not like their running game is giving some huge boost-- the Patriots' rushing average over the past four seasons is 4.26 ypc, not even league average (4.33 ypc). I thought maybe all those 1-yard quarterback sneaks on 3rd down were killing his stat line, but no amount of tweaking for conversion rates made any significant difference. He just seems to have a knack for making plays in high-leverage situations-- almost as if the Patriots knew what the other team was thinking. Given the team offensive success, you could make a strong case for Brady anywhere in the top 5 below Rodgers.
Then there's Russell Wilson. He and Luck are the only four-year starters on the list whose rookie season is included, and there are plenty of Hall-of-Fame caliber quarterbacks who had to learn a bit on the job (John Elway, Troy Aikman, and Peyton Manning all recorded rookie passer ratings under 72). The first eight games of Wilson's career were not some random bad stretch, they were a predictable and non-repeatable period of steady improvement. Excluding those is the most meaningful way of comparing Wilson to more veteran quarterbacks. (Luck, by contrast, started a little better and improved much less; excluding the first half of his rookie season would actually make his career stats worse).
Even slapping on a discount for the Seahawks frequent rushing (and a bigger bonus to other quarterbacks who pass more than average), Wilson slots in at #2 behind Aaron Rodgers with a combination of efficiency, productivity, and unambiguous team offensive success (see the DVOA list above). Wilson did not have a 'down' year in 2014 when he lost his best receivers and responded with an historic rushing season-- he adapted and carried the team. He is not in the same company as Drew Brees, Peyton Manning, and Tom Brady-- they are in the same company with Russell Wilson. And we can dispense with this nonsense about Wilson "moving up into the elite ranks after 2015". That happened in 2012. Even including his (misleading) entire rookie season numbers, Wilson is, and has been, one of the NFL's top five quarterbacks over his entire career.
 Adjusted points per drive adds one point for every 16.6 yards gained, to account for the value of field position that the offense is gifted from the defense/special teams and what they give back to their defense on non-scoring drives. Turnovers are measured similarly at -45 yards a pop. Because both factors already contribute to scoring, they are then given half weight. Because I made this up, I also included the non-adjusted points per drive.
You can learn about correlation coefficients by typing "correlation coefficient" into the box at the top of a nifty website called "Google", or by asking one of our resident math geeks in the comment section.
Team QBR is simply a weighted average if there were several QB's with significant playing time. Those with fewer than 50 pass attempts were ignored.
 Doesn't having more drives & running more plays also wear out the opposition defense? Yes, but it's irrelevant, because we already measuring efficiency stats. If it confers an advantage, that advantage would already be measured. So if two quarterbacks have identical efficiency stats, the one whose team ran fewer offensive drives (against a better-rested opposition defense, and providing more rest to his own defense) is playing better and doing more to help his team.
 Please do not pronounce it "kew kew kew". That sounds like baby talk. It's got to be "kahkik" or "k'kok" or something. To be determined organically.
 Seriously, check out BBC's The Hunt which has a sequence on Portia in their rain forest episode. Those things are freaking geniuses.
 Also, the Colts didn't run very often because they aren't good at it or built for it, not necessarily because they don't want to. But when they played an overall bad defense (such as Jacksonville), they would gladly run the ball more like a normal NFL team, which means there was a correlation (not causation) between passing efficiency (higher because of bad opposition defense) and running frequency. Such games also included more clock-working runs.
 Note that the "adjusted" part of yards/play refers to an adjustment for turnovers and touchdowns, not strength of opposition. But because DVOA is adjusted for opposition offensive quality, we expect the line to be imperfect. A correlation between VOA (no opponent adjustment) and adjusted yards/play would be even tighter.
[Caveats and Details] Tenure is the percentage of team pass attempts taken by said quarterback over the past four seasons. It is not specifically used in these calculations, but where the tenure is low there may be slight inaccuracies (for example, the replacement quarterback may have faced off against the best/worst opposition defenses, but only the 4-season average is used for opponent adjustments).
For quarterbacks who took less than 65% of their team pass attempts from 2012 to 2015, opponent and play percentage adjustments would have been too inaccurate, so only the basic TANY/A is given.
Unlike some previous models, I included lost fumbles (with the same -45 yard penalty as for interceptions). Strip-sack fumbles are incorporated into ANY/A (so that number will differ slightly from official totals) and other fumbles are added into TANY/A.
It has been argued, with evidence, that fumble recovery is a matter of luck and that all fumbles (lost or recovered) should be counted as one half a lost fumble. This probably evens out over the course of four seasons, but I also think that using the 50%-of-all-fumbles method is misleading. Quarterbacks are usually dinged with a fumble on bad snaps, even though the snaps are not their responsibility and the recovery rate is high. Also, most quarterback fumbles happen in the pocket, and there are real differences among players that affect recovery rates. A stationary quarterback who loses the ball during a sack is probably pinned down and unable to assist in the recovery, whereas a scrambling quarterback who gets the ball slapped away is in a much better position. Arguably, the scrambling quarterback made a bigger mistake, but it is results (and repeatability) that count. Also, short, strong-armed quarterbacks are less likely to lose the ball in the pocket and more likely to land on it when they do.
Everything is calculated based on totals over four seasons. A more precise measure would weight pass/rush attempts against each opponent after adjusting for opposition defense, but that level of detail requires a massive undertaking.
If a quarterback had receiving yards, for the sake of simplicity these were added into his rushing yardage if they improved his overall stat line (the assumption is that the quarterback added value by being a receiver on a trick play). If they did not help his overall stat line, they were probably batted passes which already counted against that quarterback as negative-yardage passing plays, and so were left out.
EDIT to add: As a means of incorporating play percentage into the final QQQ, use of the geometric mean is somewhat arbitrary. But my superpower combination of intuition and math skills tells me it's the right way to go.
Furthermore, I can confidently say this: Using any other single-number quarterback-only statistic, be it QBR, passer rating, ANY/A, yards/attempt, net yards/attempt, completion percentage, total yards, and even TANY/A, it's a trivial exercise to devise a pair of stat lines which produces a comparative result that is plainly wrong.
But not with QQQ. Make up any pair of quarterback stat lines you like, and the calculated QQQ might be debatable (in terms of magnitude of difference), but the result will never be counterintuitive.