According to Football Outsiders' latest simulation, the Seattle Seahawks have a 99.7% chance to make the playoffs, a 78.5% chance of being the #1 seed in the NFC, and a 27.3% chance of winning the Super Bowl.
Those numbers are obscene. By which I mean, they generate arousal. They're also incredible. By which I mean, not credible.
THE COIN FLIP ANALOGY, WITH NORMAL COINS
Let's talk probability. For a simplified version of "successfully making the playoffs", imagine that you are going to flip a coin 6 times; success is achieved if the coin lands on "heads" at least four times (a probability you've surely already calculated in your head as equal to 22/64, or 34.375%).
So you flip that baby three times, and you're off to a decent start with 2 heads and only 1 tails.
Now we have a very stripped-down analogy for the situation at hand. Seattle has had some success, and so the probability of ultimate success (at least 4 heads out of six flips) is now sitting at 50%. This simplif--
Hang on, I see a hand up in the back of the classroom. Yes?
"Jason, the odds look like they're 50-50 now, but won't that change after the next flip?"
That question, sir/madam, is simultaneously naïve and brilliant. The possible subsequent outcomes are exactly what categorical probability measures.
If the next flip is heads (p = .50),
we only need one more head out of two tries (probability = 75%) to succeed.
If the next flip is tails (p = .50),
we're in some trouble, and need to get two heads in a row (probability = 25%) to succeed.
That means that (for now) there are two paths to success. The sum of their probabilities is:
success including next flip heads = .50 X .75 = .375
success including next flip tails = .50 X .25 = .125
Total chance of success = .375 + .125 = .500
Capisce? This is one reason-- we'll tackle the others in a bit-- why the playoff odds from Football Outsiders change each week. Even if a team that was "likely" to win comes out on top, they were not certain to win, and so the new data has to be accounted for. Thus, some of the week-to-week variance is perfectly acceptable and in no way invalidates the initial probabilities.
Now, instead of doing the old-school math, you could also run a bunch of simulations:
This might help to visualize the possible outcomes, but it's a vastly inferior tool for calculating the probable number of total "heads". However, when dealing with 32 NFL teams, you can't know ahead of time exactly how many wins are needed to make the playoffs or earn a given seed. To complicate matters further, teams are always playing each other, so you cannot calculate wins independently for each team (that would allow the comical possibility of all NFL teams having a winning record).
This is why Football Outsiders runs a bunch of simulations. It is the only reason. The simulations do not provide additional information about the probability of any game's outcome, they merely take the assumed probability and apply it to playoff seeding.
COIN FLIP ANALOGY, WITH LOADED COINS
Just gonna throw this out there without any data to back it up, but some teams are better than others. When a good team plays a weaker team, the good team has a better chance (in excess of 50%) of winning.
So let's say that our coin collection includes an equal mix of "Headsies", weighted to come up heads on 2/3 of all flips, "Tailsies", which come up tails on 2/3 of all flips, and "Fair" coins.
Pick a coin at random, flip it three times, and produce our earlier result:
What kind of coin do we have? More to the point, can our guess as to the type of coin inform our prediction regarding the "at least four heads" goal?
To figure that out, you have to first calculate the probability of each coin type producing the above result:
p "Headsie" coming up HTH = 2/3 X 1/3 X 2/3 = 4/27
p "Fair" coming up HTH = 1/2 X 1/2 X 1/2 = 1/8
p "Tailsie" coming HTH = 1/3 X 2/3 X 1/3 = 2/27
...then convert results to a common denominator:
p "Headsie" HTH = 32/216
p "Fair" HTH = 27/216
p "Tailsie" HTH = 16/216
... and use the numerators as a comparable probability population:
p "Headsie" = 32 / (32+27+16) = 32/75 = .42667
p "Fair" = 27 / (32+27+16) = 27/75 = .36
p "Tailsie" = 16 / (32+27+16) = 16/75 = .21333
To put it simply, the most likely result is that we have a Headsie, but we can't count on it. However, we can now determine the chance of "at least four heads" (in six total flips) by adding the three probabilities.
p "Headsie" = 32/75 = .42667
Headsie success = 20/27 = .74074
p Headsie success = .31605
p "Fair" = 27/75 = 0.36
Fair success = 1/2 = .5
p Fair success = 0.18
p "Tailsie" = 16/75 = .21333
Tailsie success = 7/27 = .25926
p Tailsie success = .05531
TOTAL p SUCCESS = .55136
Aye, there's the rub. If we account for the chance that Fair coin or even a Tailsie could've produced heads-tails-heads, the ultimate success probability is 55%. That's certainly better than the scenario using only fair coins (50%), but a far cry from the 74% we'd get if we took it on faith that we'd uncovered a Headsie.
And there's the problem with the Football Outsiders playoff simulation. Follow along week-to-week, and you'll see that they continually adjust their odds not just for wins and losses that have occurred, but to correct for previous errors in measuring the quality (Headsie/Tailsie/Fair) of each team.
You could point out that FO's play-by-play model is more informative than three coin flips. I'm sure it is. But it's still inconclusive. More importantly, nay, most importantly, when a team has a measured DAVE at historically high levels, it's a hell of a lot more likely to be an overestimate than an underestimate. This is well-documented and well-explained by the False Positive Paradox. Simply put, an "historically elite" team is more likely than a "really good" team to produce these kinds of ratings. But because the "really good" teams vastly outnumber the "historically elite" teams, any given test result that shows "historically elite" is more likely to be a false positive.
FUNKY MAGIC COINS WITH MOVING INTERNAL PARTS
Finally, it must be noted that NFL teams differ from the coins in our analogy in that they are dynamic. You might be holding and flipping a Super-Headsie that comes up heads on 90% of its flips, but the internal weighting mechanism is going to shift around during the course of the season. Every team has players gaining experience and improving, and these rates are not equal throughout the league. Every coaching staff is accumulating film and experimenting with schemes (especially the losing ones). Every team has injuries.
If you're an average team looking at a 37.5% chance of making the playoffs, we can imagine that these factors will balance out. You're as likely to be underrated as overrated; your chances of getting better are as good as you chances of getting worse; and so that 37.5% playoff chance is pretty decent guess.
But staring at a 99.7% chance of making the playoffs, you can pretty much throw out the scenarios where things get better. Not because it can't happen, but because it can't improve your playoff odds. Leftover from this cold shower of reality is the possibility that a team is overrated, and the probability that the team quality (and opponent quality) will change over the course of the season.
* * * * * * * * * * * * * * * *
All of this is not meant as a criticism of Football Outsiders, who freely acknowledge the ephemeral nature of early-season ratings. It's a critique of the Playoff Odds Report. Specifically, a season "simulation" merely establishes playoff seeding and results based on the DAVE model. It does not account for the margin of error in the initial measurement (DAVE) nor does it simulate changes in team quality throughout the season which have occurred historically, and will occur again.