You Are Measuring QB Performance Wrong

Lately, I have seen a lot of discussion on Field Gulls trying to assess just how good a QB Russell Wilson is and it seems the range has settled somewhere between "garbage" and "maybe best QB of all time".

Rating QBs is an inherently complex process, but it should always be rooted primarily in objective measures. The problem with that, however, is that properly using analytics to make an informed opinion requires understanding what the underlying analytics are actually measuring.

I see many people use numbers without even knowing what their strengths or weakness are. As such, I am going to critique what I have seen too often lately, which is using Passer Rating to compare QBs.

For reference, I have calculated career passer rating for the 109 QBs since 1999 that have had at least 1,000 regular season passing attempts (footnote 1). The table below shows the rating and rank for Wilson and 4 top tier QBs.

QB Rating Rank
Aaron Rodgers 105.8 3
Russell Wilson 103.2 4
Drew Brees 99.6 8
Peyton Manning 99.1 10
Tom Brady 98.8 12

According to passer rating, Wilson is clearly one of the greatest QBs of all time and for some, this is the end of the conversation, but this is an extremely flawed analysis.


When comparing QBs directly, you have to consider the time frames in which they played. QB production has steadily improved over time as the game and rules evolve and coaches/players have learned from those that played before them. Passer Rating is no exception to this evolution. Here is the cumulative league passer rating for starting QBs over the last 22 years.


Comparing QBs whose careers started more than a few years apart is going to inherently bias the measure against the older QB. One way to account for this is to convert the value of the measure to an amount relative to the mean (over/under). That way, as the underlying averages increase over time, the relative distance to the average is more of an apples to apples comparison.

Here is what I mean. The following graph shows Russell Wilson's passer rating over his career in the form of weekly z-scores. This compares his rating to the other starting QBs for that week (grey dots) and converts it to the number of standard deviations away from the mean. To smooth the data, I have used a 16 game rolling average, so each dot represents a season's worth of games (footnote 2).


The black line in the middle is the average passer rating and even though we know that value increases over time, the z-scores inherently normalize the data so that the mean for any given week is always 0. This allows direct comparison of values from 2013 to values from 2021.

Notice that according to passer rating, even though Wilson's recent numbers are down a bit, he is still well above league average and doing better than many other times in his career (this will be important later). The graph shows that RW's rating has stayed between 0 to 2+ standard deviations above the mean for his whole career. So, is that good?


To compare Wilson to other QBs directly, I have to create a career rating that accounts for the bias of time. So, I took my group of 109 QBs and using a methodology described in the footnotes(3), calculated a career passer rating z-score. Wilson's score of +1.05 is illustrated as the blue-dashed line in the previous rolling average graph.

To better visualize how that +1.05 ranks against other QBs, I re-normalized all of their career scores to plot them on a standard normal curve. That's right, I took a z-score of a z-score.

Ry1flva.0.pngFor Wilson, his +1.7 number showing on the curve means his career z-score (+1.05) lies 1.7 standard deviations above the average QB's career z-score, which is about the 95th percentile. Here is the previous table of QBs updated with their new career-z scores:

QB Rating (1) Rank Career-z (3) Rank
Aaron Rodgers 105.8 3 1.40 1
Peyton Manning 99.1 10 1.28 2
Russell Wilson 103.2 4 1.05 7
Drew Brees 99.6 8 1.05 8
Tom Brady 98.8 12 0.97 10

Not a huge shake-up, but notice the older QBs moved up the ranks and Wilson moves down. This is because, we have removed some of the bias of time. Wilson finishes a bit higher than Brady and almost exactly the same as Brees, but there is now a distinct gap in score to reach Rodgers and Manning.


Passer rating is a good concept. It takes 4 different QB metrics (completion %, yards per attempt, TD rate and INT rate) and combines them into a single measure. However, it completely ignores QB scrambles and sacks, which make up about 10% of all passing plays. I probably don't have to argue much with Seattle fans that scrambles need to be part of a QB measure, but sacks tends to be a bit more divisive.

Certainly, the talent of an offensive line is a variable in a QB's sack rate, but the data has clearly shown that sacks are more a function of the QB (finding receivers, reading defenses, pocket awareness etc.), than they are the skill of the O-line. In fact, when changing teams, a QB is more likely to retain their sack rate than they are any of the 4 measures included in passer rating.

Russell Wilson lives for the deep ball. As such, he needs time for plays to develop and that invites pressure. If you want to include the yards and TDs that his style of play gets, then you need to include the sacks and scrambles that are a direct result of it. To do that, I added them to the 4 components or passer rating as follows:

  • Cmp % = (Completions + Scrambles) / (Attempts + Sacks + Scrambles)
  • YPA = ( Pass Yds + Scramble Yds - Sack Yds) / (Attempts + Sacks + Scrambles)
  • TD % = (Pass TDs + Scramble TDs) / (Attempts + Sacks + Scrambles)
  • INT % = (INTs + Sack Fumbles Lost + Scramble Fumbles Lost) / (Attempts + Sacks + Scrambles)

By this revised passer rating measure, the rolling average curve for Wilson drops, however not dramatically so:


Applying the changes to adjusted career measures results in the following:


Wilson drops a bit in this measure, while the comparison QBs gain significant ground. This makes sense as they are all QBs that were consistently good at avoiding sacks in their careers, so the gap between Wilson and the best of the best widens.


Another problem with Passer Rating is in how the components are weighted before adding them together. The formula is complex, utilizing minimum and maximum thresholds for each component. However, if we ignore those thresholds, the formula can be simplified to:

  • Passer Rating = 4.17 * YPA + 83.33 * Cmp% + 333.33 * TD% - 416.67 * INT% + 2.083

Since 4.17 is the weight applied to YPA, dividing through by 4.17 will give us the weights for each measure relative to yards:

  • Passer Rating / 4.17 = YPA + 20 * Cmp% + 80 * TD% - 100 * INT% + 0.5

These yardage multipliers make no sense. The formula implies a penalty of 100 yards for an INT. For that to be accurate, the average INT would have to result in a 100 yard field position swing. In other words, every INT is assumed to score a TD and does so off of a drive that otherwise, would have been an offensive TD. It is clearly a ludicrous assumption. Similarly, a 20 yard bonus for every completion and an 80 yard bonus for each TD are inflated values as well. So, let's fix them.

Completion rate should have no value whatsoever as it is already accounted for in YPA:

  • YPA = Cmp% * Yards/Completion.

If you want to credit a QB for completion rate then you would have to use yards per completion instead of YPA to avoid double counting. However, it makes no sense to complicate things, so just use YPA and drop Cmp%.

In 1988, the authors of The Hidden Game of Football valued an INT as -45 yards, which is still commonly used today. They also set the yardage bonus for a TD at +10 yards although that has since been commonly modified to +20 yards based on a methodology by Chase Stuart. These numbers aren't necessarily "right" but they are far more reasonable than the values used by traditional passer rating.

So, if I apply these new multipliers to my revised passer rating, the components become:

  • Cmp % = 0
  • YPA = (Pass Yds + Scramble Yds - Sack Yds) / (Attempts + Sacks + Scrambles)
  • TD % = 20 *(Pass TDs + Scramble TDs) / (Attempts + Sacks + Scrambles)
  • INT % = -45 *(INTs + Sack Fumbles Lost + Scramble Fumbles Lost) / (Attempts + Sacks + Scrambles)

which can be combined to:

  • (Pass Yds + Scramble Yds - Sack Yds + 20 * (Pass TDs + Scramble TDs) - 45 * (INTs + Sack Fumbles Lost + Scramble Fumbles Lost)) / (Attempts + Sacks + Scrambles)

That formula may look somewhat familiar to some of you as it is simply Adjusted Net Yards per Attempt (ANY/A) with QB scrambles added to it. Let's call it Adjusted Net Yards per Drop-back (ANY/d).



In ANY/d Wilson's career number pulls back a bit, but so do all the comparison QBs (some more than others). At the end of the day though, Rodgers and Manning still maintain a big lead over the others. At this point, Wilson is 0.8 standard deviations below their production, which is a significant gap.


Even with all the adjustments made so far, there are still glaring problems. There are value events in football not measured in yards.

Two of those events are TDs and turnovers, which ANY/d attempts to capture, but the formula does not account for first downs. The conversion of a 3rd & 1 has far more value than the 1 yard it took to make it happen. A QB measure should include the ability to throw for first downs and this is not something Wilson has excelled at.

Another problem with ANY/d is that not all yards are equal. A 9 yard gain on 1st & 10 is far more valuable than a 9 yard gain on 1st & 20. A pick-6 is a far worse outcome than a hail-mary interception at the end of a game. So applying static yardage equivalents to these outcomes is not appropriate.

This is why stat-nerds embrace Expected Points Added (EPA). EPA accounts for situation (down, distance, field position etc.) to provide a varying actual value for each play's outcome in terms of points. By forgoing fixed yard values, EPA more realistically values individual play outcomes like TDs, turnovers and first downs. For these reasons, EPA per drop-back (EPA/d) is the gold standard of QB stats. I am not aware of any QB measure that predicts actual points and wins better.


You can see that by EPA/d, Wilson has recently had the worst efficiency in his career as opposed to passer rating that says he is playing well above league average and much better than his previous lows. In my opinion, EPA/d is better capturing a fairly obvious reduction in Wilson's play.


Also, EPA/d reveals a large separation to the comparison group. That's not to say that Wilson's career numbers are not good. His efficiency is better than most QBs, but he isn't as comparable to the very best with this measure.

At times, Wilson plays as one of the most EPA efficient QBs in the league: it's just that he doesn't sustain that level. Here is the career weekly comparison between Wilson and Manning by EPA/d.


The green line is +2 standard deviations and QBs at that level are easily the top 3 or higher efficient QBs in the league. Wilson brushes against that line occasionally, but he doesn't stay there long. Comparatively, Manning spent much of his career at that level. For about 4 years between late 2003 and mid-2007, Manning lived on that green line and then again in 2009 and then for another 1-1/2 years in Denver.

Wilson is efficient, but he does not have consistent efficiency like a Brees, Brady, Rodgers or Manning.


We know that Passer Rating over-values TDs, over-penalizes INTs, double counts completion rate and ignores sacks. If I were to design a stat to artificially inflate Russell Wilson's production this would be it. Passer Rating was an improvement to QB measures when it was first introduced in 1973, but advances in technology and data collection have led to dramatically improved QB measures since then.

I am not saying that EPA/d is necessarily the "right" stat to use to judge QBs and no single stat can capture the value a QB provides. However, some stats are inherently better at the task than others and if you choose a poor stat, then you will get poor results.


1) Numbers deviate from official numbers because the data excludes QB spikes and includes defensive pass interference penalties as if they were completed passes. Also, the data only goes back to 1999, so any reference to Peyton Manning will exclude the poor performance of his rookie year.

2) Rolling averages were calculated off of the prior 16 games played or their total games if they had not yet accumulated 16 career games.

3) Career z-score: I limited each year's data to only QBs with at least 200 attempts. I then took each QB's passer rating by season and converted that to a z-score by year. I took a weighted average of those yearly z-scores for each QB using their season attempts as the weight. Only QBs with at least 1,000 drop-backs were retained.