Before the 2009 season began, I developed a very simple predictive1 model based solely on each NFL team’s historical record. Here is what was said about the model:
I used data from 2002 until 2008. In 2002, the NFL changed the league structure (they increased the number of divisions), so this felt like a natural point of demarcation. All data weighted equally. No account of the fact that teams are constrained to winning a certain number of games has been taken. For example, suppose there are only two teams in the entire league: it is then impossible that both can win (or lose) all their games. All ties (only one) have been counted as wins.
Below, the results of those predictions.
This table shows, for each team, the probability of winning 0 games, 1 game, …, 16 games. It has been sorted so that the team (the Patriots) with the highest probability of winning 16 is first, and the team (the Lions) with the lowest probability is shown last. All probabilities are rounded: probabilities less than 1% are shown as 0. The most likely number of games won is in bold.
The actual numbers of games won is shown by an orange background.
As expected, the model was crudely useful, at best. It did capture a certain rough order in the standings, and it did better than the naive model which simply said “All teams will win 8 games.” But it would not have been of much, if any, use to a gambler, except as a starting point.
Do not forget that the model’s predictions were conditional on the belief that all that mattered was the last seven years’ performance, a belief which is clearly false. No account was taken, for example, of how the players have changed over that time period, nor did I use any information about the schedule. This model is meant to be a baseline with which to compare the performance of other, more sophisticated models.
If you are a gambler, or somebody who actively tracked the season, we would love to hear how your pre-season guesses compared with our model.
The next level of complexity is to account for the conferences and divisions, which do much to constrain the possible number of games won and lost. That is, each team’s schedule should be part of a better model. My guess, looking at the results of this simple model, is that incorporating this step will give a fairly decent boost to our predictive skill.
So, as the Lion’s always say (before each season begins), just wait ’till next year!
I also made this prediction: “the probability that at least one team wins 0 games is about 2%. The probability that at least one teams win 16 games is about 3%.” The two rare events had low probability, and no team won or lost all games.
1Note: these results are from a predictive and not from the more usual parametric model. You won’t discover how to calculate this model from ordinary statistics textbooks.
Using a predictive model gives us at least three advantages: (1) a full account of the variability of results (parametric models always under-predict the variability); (2) statements with respect to the observables (actual numbers of games won and lost); (3) probabilities for every possibility (each team can win from 0 to 16 games, and our model can quantify each possibility).