Clustering the college QBs
Predicting QB success in the NFL is a very challenging problem: among other difficulties, it's hard to define what "success" is, and properly account for the fact that earlier picks are given many more opportunities than lower picks.
So I decided to look at the problem from a different angle, and instead ask the question:
What current NFL QBs are Matt Stafford, Mark Sanchez, and Josh Freeman most similar to, based on their college stats?
One way to try to answer this question is to apply a clustering method to the data. Here's what I did:
1. Get the final-year college stats (Completion %, Yards, Yards/Att, Int, TD, Rating, Attempts/Game, Yards/Game) from QBs drafted in 2005-2007, as well as from Stafford, Sanchez, and Freeman
2. Apply the k-means clustering algorithm to the normalized statistics. Basically, the algorithm figures out the groupings which yield the smallest within-group variances.
Here are the results. The algorithm requires that you pre-specify a number of clusters; I chose six. Reported for each group are its members as well as the average college statistics for that group:
"Group 1 : Brady Quinn, Jason White, Matt Leinart, Matthew Stafford, Mark Sanchez"
Pct. Yards Yards.Att Int TD Rating Att.G
65.2 3555.2 8.8 9.0 31.0 159.2 31.5
Yards.G
281.5
"Group 2 : Andrew Walter, Jay Cutler, Kyle Orton, Omar Jacobs"
Pct. Yards Yards.Att Int TD Rating Att.G
59.5 2988.7 7.4 7.9 28.2 138.2 38.0
Yards.G
287.5
"Group 3 : Dan Orlovsky, Derek Anderson, John Beck, Jordan Palmer, Kevin Kolb"
Pct. Yards Yards.Att Int TD Rating Att.G
60.5 3511.1 7.4 17.0 26.0 133.5 38.6
Yards.G
292.0
"Group 4 : Aaron Rodgers, Alex Smith, Jason Campbell, Troy Smith, Vince Young"
Pct. Yards Yards.Att Int TD Rating Att.G
65.8 2588.1 9.3 7.0 24.0 164.5 24.8
Yards.G
213.3
"Group 5 : Brodie Croyle, D.J. Shockley, David Greene, Isaiah Stanback, JaMarcus Russell, Reggie McNeal, Trent Edwards"
Pct. Yards Yards.Att Int TD Rating Att.G
58.6 2356.4 7.7 7.0 16.0 136.5 25.6
Yards.G
204.1
"Group 6 : Brad Smith, Bruce Gradkowski, Charlie Frye, Charlie Whitehurst, Jeff Rowe, Kellen Clemens, Josh Freeman"
Pct. Yards Yards.Att Int TD Rating Att.G
62.0 2546.1 7.3 9.0 19.3 133.5 31.7
Yards.G
223.9
Observations:
- Stafford and Sanchez profile similarly, and closely resemble (in terms of college stats) Quinn and Leinart. Good completion %, lots of yards, TDs, etc.
- David Greene = JaMarcus Russell? Uh, OK. But the rest of that grouping seems to make sense.
- Group 4 is interesting. High completion %, but relatively low yards and attempts per game. Could call these guys the "dinkers". Interesting to see Aaron Rodgers in with a couple of highly-regarded busts and potential busts-to-be.
- Gradkowski. Frye. Clemens. Freeman. Yikes.
Caveats:
- The groupings aren't totally stable, since the algorithm isn't guaranteed to find the optimal solution; if I ran things again, the groupings might change a bit, but not dramatically. Same goes if you change the number of groupings; a few names might change groups, but the overall structure would be similar. For example, for all the settings I tried, Sanchez and Stafford ended up being grouped together.
A place to bury strangers.
1 recs |
13 comments
Comments
Interesting study.
There are a lot of groupings I didn’t expect. Of course a lot of it depends on the supporting cast and the type of play calling the coach does. If you have great RBs, that might be the reason for fewer attempts. Or if you’re a running QB.
My only wish to improve this study is to have a few QBs from the last 90’s or early 2000 (I suspect the scarcity of data was the reason for this), because so far, the only QB that one can consider a success in the NFL is Aaron Rodgers and Jay Cutler, and to a lesser extent Jason Campbell, Kyle Orton, and maybe Derek Anderson.
I’m interested in group 1. I don’t think Leinart will be anything more than a Chad Pennington, but I like most of the other QBs in that cluster. I find it interesting that all of the QBs were highly touted, yet Jason White was unanimously predicted to be a long shot for success, which is why scouting is so valuable.
by LantermanC on Mar 31, 2009 10:33 AM PDT reply actions 0 recs
Oh, and is there anything more polarizing than trying to predict
the success of a college QB trying to be an NFL QB? I can’t think of anything more interesting personally. So many factors involved, so many things to debate, so many different stats to look into and question, etc.
by LantermanC on Mar 31, 2009 10:34 AM PDT reply actions 0 recs
By measuring yards and attempt per game
you are measuring the system the player came from, not the player’s innate ability. That’s why Shockley and Greene are grouped together and Leinart and Sanchez are grouped together.
I don’t think clustering quarterbacks by their stats produces a meaningful measure or profile of the players. Your “dinkers” group
Aaron Rodgers, Alex Smith, Jason Campbell, Troy Smith, Vince Young
couldn’t be much different from each other in ability or profile. Therefore, this
Gradkowski. Frye. Clemens. Freeman. Yikes.
is as sensible as this
Andrew Walter, Jay Cutler, Kyle Orton, Omar Jacobs
yikes.
by John Morgan on Mar 31, 2009 11:02 AM PDT reply actions 0 recs
I think you're probably right
Actually, I had a comment about Cutler in the first draft of the post, but removed it. And I think the Rodgers group is a bit weird as well; actually, he seemed to “jump” around clusters when I repeated the analysis more than any other QB.
I would agree that this isn’t a direct measure of a QB’s ability; but I still think it’s interesting to see these similarities/differences, as long as you don’t make a conclusion like OMG STAFFORD=LEINART=BUST!!
by cyberwulf on Mar 31, 2009 11:44 AM PDT up reply actions 0 recs
I get it then
I guess I’m getting touchy about Rosetta Stone quarterback projection stats. I fear that statistics are becoming the new jargon.
by John Morgan on Mar 31, 2009 11:50 AM PDT up reply actions 0 recs
Indeed
I’ve tried to throw in as many caveats as possible – for me, this was really about finding out which QBs had similar college stats to the current crop. If anything, these groupings illustrate the difficulty in projecting NFL performance from college stats; there are good (or at least decent) and bad QBs in almost every group.
by cyberwulf on Mar 31, 2009 1:10 PM PDT up reply actions 0 recs
I think that was a shot at me.
…and yes you seem touchy about it.
During an otherwise boring time of year, why is it such a touchy subject. You don’t think that NFL clubs “attempt” to do the exact same thing?
I’m certain that clubs prospecting these guys have stat matrixes’ that make Russell Crowe’s deciphering (A Beautiful Mind) seem sophomoric.
by iverson2169 on Mar 31, 2009 11:26 PM PDT up reply actions 0 recs
No, don't take it personally.
I recently posted two articles, one by Walter Football, and one by ESPN.com, and both were on QBs. I didn’t really read them too in depth, and posted them since we were just beginning to talk about the possibility of drafting Stafford. However in retrospect, they were not really good articles. One tried to reconcile why QBs fail, whether it was because of arm strength, the system, or the intangibles they had. The other had some convoluted stats meshing system that placed Matt Leinart very highly and didn’t have any rhyme nor reason to its madness. It just so happened that most of the good QBs ended up in the good group, and most of the bad ones were below the arbitrary threshold.
by LantermanC on Apr 1, 2009 12:21 AM PDT up reply actions 0 recs
The one thing a blog entry cannot convey....
…is TONE. I have very thick skin and in no way have meant to convey in ANY post that I am offended by any comments. I viewed the “rosetta stone” comment as a playful jab at my fanpost and said as much. Had anyone been able to see me say it in person, it would have been delivered with a half smirk and a wink… no problems here at all.
As the owner of a garment factory in Khon Kaen Thailand with 16,000 employees, I have been in absolute WARS with the biggest branded sports apparel companies in the world (the dreaded swoosh and their competition) and come out alive. My point? If I don’t take those encounters personally, I certainly don’t take anything personally on a Seahawks blog site (a damn good one at that). If it weren’t for differing opinions, we’d all be a pretty dull (and ignorant I might add) bunch.
I totally get John’s point. He thinks many of us are playing amateur draft scout and trying to create “paint by number” systems for drafting QB’s that utilize stats samples that are too shallow. My response:
Of course we are. Nobody here is going to reinvent an NFL wheel, and none of us will ever touch an NFL GM chair. That is the whole joy of logging onto John’s site and taking time out of our days to make educated interactions with other rabid Hawks fans.
by iverson2169 on Apr 1, 2009 2:28 AM PDT up reply actions 0 recs
Entrepreneur huh?
Seems tough.
Yeah, I think what John means is that while these studies are usually interesting, have a healthy dose of skepticism when reading them (to everyone not just you or me) and don’t take every study to be a fact. Sometimes people just manipulate numbers or studies to say what they want to say.
by LantermanC on Apr 1, 2009 8:41 AM PDT up reply actions 0 recs
It is a challenge...
… but in these economic times, we are thriving. The reason is because the branded companies are all very interested in sustainability. Non-delivery due to a failed subcontractor would mean disaster in a rough economy. Because of this, smaller, less stable companies end up losing work, while the super-factories swell in size.
by iverson2169 on Apr 1, 2009 9:05 PM PDT up reply actions 0 recs
Don't be confused
I’m not at all mad at you or am attempting to take swipes at you. I’m irked by the statistical mumbo jumbo created by people whose goal is to shock and amaze and forgo work on the way to expertise. I don’t think that’s you. I think it’s the people creating the suddenly dime a dozen quarterback projection systems.
by John Morgan on Apr 1, 2009 10:51 AM PDT up reply actions 0 recs
Why is everyone so down on Campbell?
It seems like every quarterback projection system put Campbell in with the busts but I don’t see it. I thought he was very good last year considering the odd to terrible playcalling.
by Nate Dogg on Mar 31, 2009 12:54 PM PDT reply actions 0 recs

by 















