clock menu more-arrow no yes mobile

Filed under:

A sample's a sample, no matter how small

Testing the reliability of four popular QB stats.

This pass is not a reliable sample size.
This pass is not a reliable sample size.
Streeter Lecka

What I’ve done in this post has been done before. I didn’t invent it. But, we know if something is worth doing then it’s worth doing yourself. And we also know that to make writing speak true you should really use a cliche or two. I believe that seeing and thinking about these numbers is worth doing regularly and knowing how they’re produced is important to deciding their value.

What’s a sample worth?

In the Simpson’s episode (bet you didn’t see that coming) Lisa Gets an A. The family decides to go to Eatie Gourmet’s for food after church. At the store the sample stands say free, but we intelligent observers know that they’re not free at all! In fact the family is paying for the operating cost of the car over the distance driven.

I think their car is probably a 1980 Plymouth Gran Fury, with the beat-up nature of the car, the harsh roads in Springfield, the aggressive driving style of Homer, and the primary drivers’ history of accidents I don’t think $1 a mile is an unfair projected operating cost.

The drive home from church is 5 blocks, the additional distance to and from Eatie Gourmet’s on their way home is 19 blocks. We see the Simpson’s eat at least 37 samples. Assuming block lengths of an eighth of a mile we know that, to the Simpson’s, an average sample (through sample 37) is worth at least 6 cents.

If anyone knows a few more episodes where the Simpson's eat free samples I'll get right on setting up their sample utility curve. The world needs to know.

Not all samples are equal though. At first small samples were fine for Homer but as his hunger was sated he had to go for whole chickens and hams. Decisive evidence that large samples have more utility than small samples.

As with Homer, so too with the football fan.

As with food, so too with football.

In football some samples are best avoided while others should be gobbled up without a second thought. Many must be approached with caution, like grocery store sushi. A key consideration in determining whether a sample is one or the other is its size. However, we can’t just look at the raw size we also have to consider what the sample is of. Considering both we can start to get a sense of whether we should accept a sample as robust enough.

One way of determining the adequacy of a sample size is seeing how much of the sample is up to underlying, repeatable talent and how much is noise. A good sample might be one where we expect the numbers to reveal more about an athlete’s talent than they leave obfuscated.

If you're a fan of baseball stats you know where this is going but before we get there I have a caveat.

The numbers that follow need to be interpreted in the context of football. In football, players can play in the same systems their whole career. Those can be terrible systems or they can be Belichick's offense. QBs can play with great wide receivers or bad ones. Great lines or the 2012 Cardinals. At some level all stats are team stats. That means true talent levels will always be more obscured than in baseball. Much more. That doesn't mean we shouldn't use numbers. It means we must be mindful of our limitations and guard against overreaching them.

Now, with all that of the way, be warned for here there be maths.

The following table is arrived at by taking all of the players in my data set who meet a particular sample size cut off for a particular denominator (In this case everything is attempts), splitting the sample in half and assigning odds to one half and evens to the other (there are better ways to do this part, ask me why I didn’t use them if you're curious), and then running a regression of one half against the other. I did this for fifty sample sizes within each stat and then ran a regression to determine at what size r is expected to equal .5. The resultant number represents the size required for a sample to speak louder than the league average.

Stat Attempts Games
Completion % 187 6
Yards per attempt 435 13
Touchdown rate 1472 45
Interception rate 13,078 394

The following are the graphs of sample size versus correlation with some regression lines and equations.


The most interesting takeaway from the numbers is that you should probably just assume there is no interception skill. It's been said before but it bears repeating: if a QB's performance includes an interception rate far from the league average you should probably regress that before making any judgements about his talent.

And finally, what does this mean for you?

Well, it means that Russell Wilson has shown us a lot about his completion percentage something about his yards per attempt but very little about his interception rate and TD rate. There is no reason to believe that his record setting touchdown rate will persist, but there is reason to believe that his stellar YPA will.


More Reading:

The Numbers Game | X's & O's Breakdowns

Grading the only Draft we can - A 2010 retrospective

Looking back at Seattle's Week 1 loss to Arizona

What does Antoine Winfield do for the Seattle defense?

How DangeRuss is the QB rush?

Malcolm Smith: Now or never

91-man roster: Seahawks age chart as of May 1st, 2013

Seahawks' 20-man rookie class age/height/weight/speed

Schneider's schooling