FanPost

ESPN Has A QBR Problem


ESPN is becoming ever more dependent on its proprietary statistic, Quarterback Rating (QBR), and the driving force behind QBR, Expected Points Added (EPA). When reading almost any article on ESPN that even marginally deals with comparative quarterback play, QBR is the stat that almost every writer will cite.

The issue finally came to a head for me, when FiveThirtyEight, an ostensibly statistics driven site that is strongly married to ESPN, ran an article assigning NFL quarterbacks into 10 different tiers. The methodology was relatively straightforward in that they plotted the QBR scores, by game, of each quarterback, analyzed the distribution of those scores, and placed each quarterback into tiers based on the some of the distinguishing features of the distribution.

I will not elaborate further on the methodology of this article largely because I do not believe that there were any serious failings in the execution of the analysis. However, it is in the basic assumption that QBR is a good descriptive statistic that ultimately doomed this article to irrelevance.

As a fan of the Seattle Seahawks that has personally watched Russell Wilson become perhaps the most valuable player in the NFL, I have a natural distrust for any statistic that proposes to capture total quarterback value and continues to rate Wilson as essentially slightly above average. Here at Field Gulls, we do not pay any attention to QBR because we know that it does not capture the value of our franchise quarterback. We consistently dismiss it as a bad statistic because we know, in our blue and green DNA, that it is bad at rating Russell Wilson.

Yet what intrinsically makes it a bad statistic? The answer is that QBR is subject to significant mathematical failings as well as serious potential bias in its calculation. The rest of this post will attempt to define the major shortcomings of QBR. I will try to be pithy.

(I apologize if there has been a similar post, article, or discussion. If such a piece exists I have not read it.)

On their QBR rankings page, ESPN has a link to an explanation of QBR. Although QBR is quite the black box, this article at least explains the general process that ESPN analysts use to calculate QBR. It basically consists of three distinct steps:

The first step makes use of historical data to calculate the Expected Points Added (EPA) of each play. The second step divides the EPA of each play among the players involved in the play, assigning each some portion of the total EPA. The third step is to assign a multiplier to the EPA dependent on the Win Probability Added (WPA) to the play.

EPA, in theory, is a great idea, and it was the impetus for the development of QBR. QBR itself was a grand plan to fix the issues involved with Passer Rating and there is nothing insidious about the conceptualization of QBR. Further, EPA is a good way to analyze the performance of entire teams, in that good teams outperform the historical average point expectation and bad teams do not.

It is in the assigning of EPA to individual plays and the players involved that creates flaws. This is not about teams being greater than the sum of their parts. It is something much more mathematically simple. Further, this is still an analysis of the first step in the process. We are not yet dealing with the issues surrounding breaking up the EPA of an individual play amongst players. For simplicity sake, let us assume that EPA is assigned wholly to either the quarterback or running back for each individual play. A hypothetical example will perhaps serve best to illustrate the problem.

EPA proposes to assign a quantitative value to the performance of a player, so let us quantify two hypothetical drives. First, some hypothetical values (it does not matter whether the values are accurate; they are realistic enough to illustrate the point):

1st and 10 from your own 20: 2(hypothetical EP)

3rd and 10 from your own 20: 1

1st and 10 from your own 40: 3

3rd and 10 from your own 40: 2

1st and 10 from the opponent 40: 4

3rd and 10 from the opponent 40: 3

1st and 10 from the opponent 20: 5

3rd and 10 from the opponent 20: 4

Touchdown: 7

If Russell Wilson were to take over on his own 20 and throw 4 straight 20 yard passes, his EPA would be 5 because the first 3 passes would add 1 each to his accrued EPA and the touchdown throw would add 2 (assuming he gets full credit for the EPA for each play). Further, the Seattle Seahawks as a team would be assigned 5 expected points added for the drive.

If Aaron Rodgers were to take over on his own 20 and drive his team down the field for a touchdown, the Packers as a team would gain 5 expected points. But let us suppose that each set of downs in this drive consisted of a Lacey run for no gain on first and second down, followed by a 20 yard Rodgers pass. In this case, Rodgers would be assigned 9 expected points added; almost twice what Wilson would be assigned for performing exactly the same football plays.

This example illustrates that calculating EPA for individual players can have wildly different results for exactly the same statistical performance. (Yes, it can be argued that Rodgers’ performance was slightly more clutch in this case, but the clutch factor is weighted in later in the process and for right now, we are dealing solely with a counting statistic: in one case four 20 yard passes resulting in a touchdown are worth 5 added points while in another case they are worth 9, before the clutch index is factored in.)

The second and third step exacerbate this problem by further dividing the EPA of each play among different players and then multiplying this dubious result by a clutch index depending on the game situation.

I do not think I need to belabor the point that the second step remains highly subjective, despite ESPN’s extensive efforts to objectively quantify the process. ESPN’s response to this is to essentially say, "You have to trust us. We are the experts. We did as good a job breaking this down as can be done." It is in this stage that the black box becomes truly opaque. We really have no idea how this is being done, but we do know that it is not an automated process driven by statistics alone: there is a human being deciding how much credit Peyton Manning gets for a completed pass versus how much Ryan Tannehill gets.

The third and final step is perhaps the most distorting process of them all because it is dependent on Win Probability Added (WPA). Again I will illustrate with a Wilson/Rodgers hypothetical:

Assume that Wilson and Rodgers both play against the same team (obviously on different weeks) and that they both have identical days: they both throw 21 of 28 for 350 yards, 4 touchdowns, and no interceptions. Further assume that each and every one of their throws was exactly the same down and distance from exactly the same field position, etc. You get the idea. In a vacuum that ignores the game situation, their performances are exactly the same.

However, Wilson has the benefit of a strong running game and dominant defense and the result is a runaway 43-8 victory while Rodgers’ performance is literally the only competent aspect of the Packers’ day. Despite special teams gaffes, awful play on the defensive side of the ball, and fumbles being the only ‘production’ from the running game, Rodgers heroically leads his team to a last second touchdown to win 28-27.

Each play by each quarterback is given exactly the same initial EPA value by the black box at ESPN, but because the value for each play is then assigned a multiplier depending on the win probability added associated with each play, Wilson’s performance is effectively downgraded because of the dominance of the team around him, while Rodgers’ EPA is continually bolstered because of the horrible play of the rest of his team.

In short, QBR is systematically biased against quarterbacks that have effective running games and dominant defenses. The more a team relies on its quarterback to do the majority of the heavy lifting, the higher QBR will rate that quarterback, even if the quarterback did not do anything quantitatively more than another quarterback on a better balanced team.

QBR begins with a flawed extrapolation of a sound idea in that it seeks to break up team EPA into component parts by individual plays. By doing this, it creates a situation where a player can have more EPA for a drive or a game than the team does overall. Thus quarterbacks who have exactly the same statistical performance, but have teammates performing better around them will earn less EPA than those that are using those same statistics to rescue their failing teammates.

It then exacerbates this flaw by further, subjective, division of the value of each play amongst different players and then multiplies this exceptionally inexact, and possibly highly biased, result by a number that has nothing to do with the actual play made by the quarterback, but is rather generated by the game situation. This results in a multiplier that is higher for quarterbacks whose teams perform just well enough to keep the game close.

QBR is a shite statistic that is further distorted with each step in its calculation. It is being shoved down our throats by an entity that has poured millions of dollars into its design, calculation, and the surrounding analysis concerning its flawed results.

I do not believe ESPN originally intended QBR to be a condescending project, but the final product is an arrogant stat. It is impossible for even the highly intelligent fan to dissect QBR. While Passer Rating has its flaws, by comparison to QBR, it is a much more fan friendly stat. If Cam Newton racks up 150 yards at the end of a lost game and his passer rating finishes at 95, fans can parse the number and point out that the majority of the statistic was accrued in garbage time against a prevent defense. It can be discussed, and discussed intelligently, precisely because the formula is known and the quantities involved can be found in the box score.

QBR robs fans of this ability. QBR asserts that this work has already been done for us, that the work is proprietary and, further, beyond our limited ability to understand it.

Just lay back and accept it.

No means no.