/cdn.vox-cdn.com/uploads/chorus_image/image/16987251/20120924_ajl_al2_322.0.jpg)
I love projections.
They don't need to be mine. They don't need to be relevant to my life or interests. It's not the subject matter that makes me love them. It's just predicting the future.
Guys, we can predict the future in rigorous, falsifiable, and significant ways. That's amazing.
This is a report [PDF] on the rabbit mortality. On page 4 there there is a neat little table telling us the likelihood of survival from one age to another for a given rabbit for certain qualities of year. The authors know something about rabbits that don't even exist yet. They can tell us how likely a rabbit whose grandparents haven't even gotten sexy yet will be to survive from birth to four weeks. And then they can make the prediction more accurate if you tell them what the weather is like.
I don't have any special connection to rabbits but that's pretty cool
So it's weird that I have a hard time getting excited for football projections. I like football and the only rabbits I've ever known were dicks. So, why would I read a paper on rabbit mortality that i'll never use again but purposefully avoid season projection articles?
I think it's because of the presentation of most football projections. I've seen two basic kinds. The most likely season, which just gives a win total for each team based on some method of simulating the coming season, rigorous or not - and the team quality, which gives a range or single win total based completely on the perceived quality of a team - rigorous or not.
Neither really capture how the season happens - the first is closest and I'll admit I've seen that method spit out win total ranges and that is so close to what I want it hurts - but it's still not quite there.
Here's the problem: for my money an adequate season projection must capture the uncertainty in two factors - the uncertainty of single game outcomes and the uncertainty of team quality. Showing projections that account for two uncertain variables is difficult. You can't just use a single win total and if you just use a win range it gets uselessly large very fast.
More importantly, a projection system should explicitly show the reader the impact of those factors .
The way the good ones usually work is to account for the uncertainty of game outcomes be providing a range for win totals while accounting for the uncertainty of team quality by projecting team quality based on black box (unknown to the reader) projections.
I hate black boxes and I dislike single value projections of uncertain values. No one knows how good the Seahawks will be next year. We can project a most likely value but I don't think that's good enough for projections - especially when season simulations mean doing that for every team in the league.
These are problems I can solve!
Essentially we have two uncertain things, team quality and opponent quality, that determine the distribution of a third thing of interest - season win totals. Joy of joys I can project the distributions of the first two and use them to determine the third. All projections will be based on past score differential. I could get smaller ranges using DVOA - and some of you will probably think I should have when you see how wide the ranges are but, again, I don't like black box predictions.
So how do you project team quality based on past score-diff? With this:
This is a chart showing a histogram of changes in point differential from season n to season n+1 for every team for the last ten seasons. Note that although it is not normal I have put a normal curve over it. Frankly, I'm not sure why I did that but I do think that curves over histograms look cool. My individual team projections are actually based on a triangular distribution which, when taken in aggregate, approximates the distribution above very well. And in turn provides an excellent approximation of the movement of point differential from season to season. the distribution skews toward the mean (0) on either side so regression is alive and kicking in this model. Here are some example distributions of future score differential based on the previous season's:
Note: these are dramatized a bit for ease of reading - in particular the team with a 0 point diff actually has a much smaller range of possible values.
Want the equation for those? No you don't.
A lot of the junk in there is conditional treatments of x to get the skew of the distribution on the proper side. Overall it is the equation equivalent of my dancing. Weird, graceless, and with only an abstract relation to reality.
At any rate, now I can randomly select team qualities based on probability. I use a similar method to simulate game outcomes - essentially I take the team quality as the mean of a game quality distribution and then randomly pick a quality based on the distribution - though this time it's a simple normal curve.
I run the full season for a given set of team qualities, rinse and then repeat 100,000 times.
At the end, for each team I have a list of simulated seasons where each one has a win total, team quality, and opponent quality. I can then use that data to make an informative chart describing expectations for a given team's coming season.
Before I post all of the beautiful charts and figures I want to make a couple of notes and explanations.
In the table that comes first I have two confidence ranges, .5 and .98. Both are centered at the mean. They are significant because they both represent an indifference point. The .5 confidence interval is obviously so. If someone offers you a bet at 1:1 odds that a team will fall inside or outside of that interval you don't are which side of the bet you're on. The .98 interval is the same thing except it's for the whole league. In other words the entire league will falls within that range of wins in half of the simulated seasons.
Note how huge that interval is. The Chiefs could get 13 wins in that interval. I'll freely admit that I erred on the side of wide prediction ranges (in particular when I picked the triangular distribution for season+1 score diff projections instead of a dual exponential or the like), but the impact isn't more than a game. This is why single value win totals predicted for the entire league are garbage. They are just power rankings and should be presented as such.
The strength of schedule column is the average strength of opponents across all simulated seasons rescaled to go between 0 (easiest) and 1 (hardest). It is an intentionally unit-less measurement.
That same scale is used in the charts that follow for the contour plots of team quality versus opponent quality.
Finally, the divisional charts show win totals as continuous normal curves - that is not how I figured the projected values in the table. Those were pulled from the raw data. The curves in the graphs are just there to remind you of boobs. Which they look almost exactly nothing like. Except maybe the Chargers'.
Here's the big table:
Looking at SOS it's easy to see why NFC North fans have such a cross to bear. It would be easier for a team with a weak QB to pass a camel through the eye of a needle than to make it into the playoffs in that division.
Okay, that was fun. Now tons of charts!
NFC Best
NFC Saints Won't Rebound
NFC West Minor League Affiliates
NFC Something Else
AFC Patriots
AFC Broncos
AFC I still kinda don't like the Steelers, is that okay?
AFC, Um, South
Please don't make any bets based on this information. If you do anyways and lose you can't hold me accountable. If you do anyways and win you owe me half.