This Jupyter notebook describes the
nflmodel Python package which can be used to predict the full probability distribution of NFL point-spread and point-total outcomes.
I describe the theory behind the algorithm at length in an arXiv pre-print, and you can also read about it on the MELO sphinx documentation page. The purpose of the blog post is not to rehash the theory behind the model, but rather to demonstrate how it can be used effectively in practice.
nflmodel package requires Python3 and is intended to be used through a command line interface. I've tested the package on Arch Linux and OSX, but not on Windows.
The model also requires an active internet connection to pull in the latest game data and schedule information. I'm working to enable offline capabilities, but this does not exist at the moment.
Navigate to the parent directory where you want to save the
nflmodel package source, then run the following from the command line to install.
!git clone https://github.com/morelandjs/nfl-model.git nflmodel !pip3 install --user nflmodel/.
Cloning into 'nflmodel'... remote: Enumerating objects: 199, done. remote: Counting objects: 100% (199/199), done. remote: Compressing objects: 100% (125/125), done. remote: Total 316 (delta 101), reused 146 (delta 59), pack-reused 117 ... Successfully built nflmodel Installing collected packages: nflmodel Successfully installed nflmodel-0.1
After installing the
nflmodel package, you'll need to populate the database of NFL game data. Since this is presumably your first time running the package, it will download all available games dating back to the 2009 season.
[INFO][data] updating season 2009 week 1 [INFO][data] updating season 2009 week 2 ... [INFO][data] updating season 2019 week 17
Subsequent calls to
nflmodel update will incrementally refresh the database and pull in all new game data since the last update. If at any point your database becomes corrupted, you can rebuild it from scratch using the optional
Now that we've populated the database, let's inspect some of the game data contained within.
datetime season week ... team_away qb_away score_away 0 2009-09-10 20:30:00 2009 1 ... TEN K.Collins 10 1 2009-09-13 13:00:00 2009 1 ... MIA C.Pennington 7 2 2009-09-13 13:00:00 2009 1 ... KC B.Croyle 24 3 2009-09-13 13:00:00 2009 1 ... PHI D.McNabb 38 4 2009-09-13 13:00:00 2009 1 ... DEN K.Orton 12 ... ... ... ... ... ... ... ... 2811 2019-12-29 13:00:00 2019 17 ... PHI C.Wentz 34 2812 2019-12-29 13:00:00 2019 17 ... ATL M.Ryan 28 2813 2019-12-29 16:25:00 2019 17 ... OAK D.Carr 15 2814 2019-12-29 16:25:00 2019 17 ... ARI K.Murray 24 2815 2019-12-29 16:25:00 2019 17 ... SF J.Garoppolo 26 [2816 rows x 9 columns]
Each row corresponds to a single game, and each column is a certain attribute of that game. At the moment I populate the following attributes:
nflmodel games runner is printing a pandas dataframe, so it may hide columns in order to fit the width of your terminal window. If you want to see more column output, try enlarging your terminal window.
If you want to see more row output, use the
--tail commands. For example,
!nflmodel games --head 3
datetime season week ... team_away qb_away score_away 0 2009-09-10 20:30:00 2009 1 ... TEN K.Collins 10 1 2009-09-13 13:00:00 2009 1 ... MIA C.Pennington 7 2 2009-09-13 13:00:00 2009 1 ... KC B.Croyle 24 [3 rows x 9 columns]
The games dataframe is sorted from oldest to most recent, so
--head N returns the N oldest games in the database and
--tail N returns the N most recent games. If you want to print the entire dataset to standard out, just set
--tail to a very large number.
The model predictions depend on a handful of hyperparameter values that are unknown a priori. These hyperparameters are:
Home field advantage is accounted for naturally by the
MELO model constructor, so it is not included in the above list.
The hyperparameter values are calibrated by the following command (this will take several minutes or so, so grab a drink).
!nflmodel calibrate --steps 200
[INFO][model] calibrating spread hyperparameters [INFO][data] updating season 2019 week 17 [INFO][tpe] tpe_transform took 0.001760 seconds [INFO][tpe] TPE using 0 trials ... [INFO][tpe] tpe_transform took 0.001731 seconds [INFO][tpe] TPE using 199/199 trials with best loss 10.373855 [INFO][model] caching spread model to /home/morelandjs/.local/share/nflmodel/spread.pkl [INFO][model] calibrating total hyperparameters [INFO][data] updating season 2019 week 17 [INFO][tpe] tpe_transform took 0.001551 seconds [INFO][tpe] TPE using 0 trials ... [INFO][tpe] tpe_transform took 0.001817 seconds [INFO][tpe] TPE using 199/199 trials with best loss 10.694746 [INFO][model] caching total model to /home/morelandjs/.local/share/nflmodel/total.pkl
Here the argument
--steps (default value 100) specifies the number of calibration steps used by hyperopt to optimize the hyperparameter values. I've generally found that around 200 calibration steps is sufficient, but the convergence is already very good after only 100 steps.
MELO model is actually very fast to train once the hyperparameter values are known, so default behavior is to retrain the model every time it is queried from the CLI. This ensures that the model predictions are always up to date.
Once calibrated, the
nflmodel package can forecast various metrics for a given NFL season and week. For instance, the command
!nflmodel forecast 2019 10
[INFO][nflmodel] Forecast for season 2019 week 10 favorite underdog win prob spread total date 2019-11-10 @NO ATL 0.91 -9.9 47.6 2019-11-10 BAL @CIN 0.91 -9.2 45.7 2019-11-10 @IND MIA 0.71 -5.2 47.7 2019-11-11 @SF SEA 0.67 -4.4 48.9 2019-11-10 @DAL MIN 0.54 -3.0 45.0 2019-11-10 BUF @CLE 0.57 -2.8 42.0 2019-11-10 KC @TEN 0.58 -2.7 48.6 2019-11-10 @CHI DET 0.61 -1.8 44.7 2019-11-07 @OAK LAC 0.54 -1.7 43.2 2019-11-10 @GB CAR 0.53 -1.4 53.6 2019-11-10 @NYJ NYG 0.53 -1.3 46.9 2019-11-10 ARI @TB 0.53 -0.6 50.9 2019-11-10 LA @PIT 0.51 -0.3 44.9 *win probability and spread are for the favored team
will generate predictions for the 10th week of the 2019 NFL season, using all available game data prior to the start of that week. If you do not provide the year and week, the code will try to infer the upcoming week based on the current date and known NFL season schedule.
The output is structured relative to the favorite team in each matchup, with home teams indicated by an '@' symbol. For example, the forecast output above says that NO is playing at home versus ATL, where they have a 91% win probability, are favored by 9.6 points, and are predicted to combine for 47.4 total points. The games are also sorted so that the most lopsided matchups appear first and most even matchups appear last.
The model will also provide team rankings at a given moment in time. For example, suppose I want to rank every team at precisely
datetime = 2019-11-11T20:23:06. I can issue the command
!nflmodel rank --datetime "2019-11-11T20:23:06"
[INFO][nflmodel] Rankings as of 2019-11-11T20:23:06 win prob spread total rank 1 BAL 0.78 │ NE -10.5 │ TB 50.1 2 NE 0.77 │ BAL -7.7 │ KC 50.1 3 SEA 0.75 │ SF -6.3 │ NYG 48.3 4 SF 0.72 │ MIN -5.8 │ BAL 48.0 5 NO 0.71 │ HOU -4.5 │ SEA 47.9 6 HOU 0.69 │ LA -4.5 │ CAR 47.6 7 MIN 0.68 │ DAL -4.2 │ ARI 46.8 8 GB 0.67 │ NO -4.1 │ ATL 46.7 9 PIT 0.64 │ SEA -3.2 │ OAK 46.6 10 CAR 0.62 │ PIT -3.0 │ PHI 46.5 11 PHI 0.61 │ KC -2.8 │ DET 46.4 12 LA 0.60 │ PHI -2.8 │ GB 46.0 13 KC 0.57 │ GB -2.6 │ HOU 45.9 14 DAL 0.55 │ CAR -2.5 │ SF 45.8 15 OAK 0.54 │ LAC -2.0 │ CIN 45.6 16 CHI 0.54 │ CHI -2.0 │ NO 45.5 17 TEN 0.53 │ BUF -0.7 │ LA 45.4 18 JAX 0.51 │ JAX -0.2 │ DAL 45.3 19 BUF 0.50 │ DEN -0.2 │ CLE 45.2 20 DEN 0.48 │ CLE 1.1 │ NYJ 45.0 21 LAC 0.48 │ ATL 1.2 │ IND 45.0 22 CLE 0.47 │ IND 1.4 │ PIT 44.8 23 ATL 0.43 │ TEN 1.4 │ MIA 44.8 24 DET 0.41 │ DET 1.4 │ MIN 44.2 25 TB 0.39 │ TB 1.7 │ NE 44.0 26 ARI 0.38 │ OAK 3.0 │ LAC 43.4 27 IND 0.38 │ ARI 3.1 │ JAX 43.0 28 MIA 0.38 │ NYG 4.5 │ TEN 43.0 29 NYJ 0.33 │ WAS 4.7 │ WAS 42.3 30 WAS 0.33 │ CIN 4.8 │ BUF 42.2 31 CIN 0.31 │ NYJ 5.4 │ CHI 41.7 32 NYG 0.28 │ MIA 5.9 │ DEN 41.6 *expected performance against league average opponent on a neutral field
and see how the model thinks the teams should be ranked at that moment in time, according to their predicted performance against a league average opponent on a neutral field. The far left column is each team's predicted win probability, the middle column is its predicted Vegas point spread, and the far right column is its predicted point total.
The table above, for instance, tells me that BAL is most likely to win a generic matchup, while NE is most likely to blow out their opponent, and TB is the most likely to find itself in a shoot out.
One attractive feature of the
MELO base estimator is that it predicts full probability distributions. This enables the model to do all sorts of cool things like predict interquartile ranges for the point spread or draw samples of a matchup's point total distribution.
Most notably, it means that the model can estimate the probability that the point spread (or point total) falls above or below a given line. This means that it can evaluate the profitability of various bets point spread and point total lines.
This capability is accessed using the
nflmodel predict entry point. For example, suppose you want to analyze the game
2019-12-01 CLE at MIA with the following betting profile
|CLE||-9.5 (-110)||O 46.5 (-105)|
|MIA||+9.5 (-110)||U 46.5 (-115)|
This is accomplished by calling the
nfl predict runner with the following arguments.
!nflmodel predict "2019-12-01" CLE MIA --spread -110 -110 9.5 --total -105 -115 46.5
[INFO][nflmodel] 2019-12-01T00:00:00 CLE at MIA away home team CLE MIA win prob 68% 32% spread -12.6 12.6 total 44.2 44.2 score 28 16 spread cover 59% 41% spread return 14% -24% over under total cover 45% 55% total return -13% 4% *actual return rate lower than predicted
There's a lot of information here, so let's unpack what it means. First, the model believes CLE is a heavy favorite on the road. CLE's predicted win probability is 68%, their predicted point spread is -12.6 points, and their predicted point total is 44.2 points (according to the model). This would correspond to a characteristic final score of 28-16 CLE over MIA with fairly large uncertainties.
The model is also providing input on the Vegas point spread and point total lines. It expects CLE to cover their point spread line 59% of the time which is good for a 14% ROI, accounting for the house cut. Conversely, betting on MIA is expected to net a loss of 24% on average.
The over/under metrics are reported in a similar fashion. The model thinks there's a 55% chance the total score goes under the published Vegas line which is good for a 4% return on average. Similarly, taking the over is expected to net a 13% loss.
While it has not been readily apparent up to this point, the model accounts for changes at the quarterback position. It does this by tracking ratings at both the team and quarterback level.
For example, suppose Tom Brady were the only quarterback that ever played for the Patriots. Then there would be two associated ratings, one for T.Brady and one for NE, which are effectively identical. If however, Tom Brady left NE and went to play for SF for the last two years of his career, his rating would diverge from NE's rating and begin to track the performance of SF.
I take the weighted average of QB level and team level ratings when generating the effective rating for each upcoming game. In this way, I mix together the historical performance of the team with its QB. The weighted average is controlled by the
qb_weight hyperparameter which is fixed when calibrating the model.
The model has no direct knowledge of QB injuries, so you'll have to explicitly tell the model to generate predictions with a different quarterback if that's what you intend to do. At the moment, the only runner that can accommodate QB injuries is the
nflmodel predict runner.
Suppose, for example, you want to see how CLE would perform on the road against PHI on 2019-12-01 if Carson Wentz went out with an injury practicing before the game. First let's see how the two teams would matchup if Wentz never got hurt.
!nflmodel predict "2019-12-01" CLE-B.Mayfield PHI-C.Wentz
[INFO][nflmodel] 2019-12-01T00:00:00 CLE-B.Mayfield at PHI-C.Wentz away home team CLE-B.Mayfield PHI-C.Wentz win prob 45% 55% spread 4.4 -4.4 total 44.3 44.3 score 20 24 *actual return rate lower than predicted
Notice that the predictions are exactly the same if I omit the quarterback suffixes.
!nflmodel predict "2019-12-01" CLE PHI
[INFO][nflmodel] 2019-12-01T00:00:00 CLE at PHI away home team CLE PHI win prob 45% 55% spread 4.4 -4.4 total 44.3 44.3 score 20 24 *actual return rate lower than predicted
This is because both Baker Mayfield and Carson Wentz played in the game preceding the specified date, so the model assumes they are the starting QBs for upcoming games by default.
Suppose now, that Wentz got hurt. We can specify his backup QB Josh McCown to see how that would affect the model predictions.
!nflmodel predict "2019-12-01" CLE PHI-J.McCown
[INFO][nflmodel] 2019-12-01T00:00:00 CLE at PHI-J.McCown away home team CLE PHI-J.McCown win prob 53% 47% spread -2.8 2.8 total 44.3 44.3 score 24 21 *actual return rate lower than predicted
This creates roughly a 7 point swing in the point spread and CLE is now the favorite. In faireness to Josh McCown, the magnitude of this point shift is not purely a statement about the better QB. It also accounts for the fact that PHI has not been game planning for McCown, and the offense is not built around him.
nflmodel package includes a command line runner to validate the predictions of the model. If the model predictions are statistically robust, then the distribution of standardized residuals will be unit normal and the distribution of residual quantiles will be uniform.
This is a very powerful test of the model's veracity, but it does not necessarily mean the model is accurate. Rather it tests whether the model is correctly reporting its own uncertainties. To quantify the model's accuracy,
nflmodel validate also reports the models mean absolute prediction error for seasons 2011 through present.
[INFO][validate] spread residual mean: 0.08 [INFO][validate] spread residual mean absolute error: 10.37 [INFO][validate] total residual mean: 0.26 [INFO][validate] total residual mean absolute error: 10.69
This will produce two figures,
validate_total.png in the current working directory. For example, the point spread validation figure is shown below.
The model's point spreads have a mean absolute error of 10.37 points and its point totals have a mean absolute error of 10.69 points. The table below compares the model's point spread mean absolute error to Vegas for the specific season range 2011–2018.
|MODEL||10.35 pts||10.67 pts|
|VEGAS||10.22 pts||10.49 pts|
The model is less accurate than Vegas, but these numbers are still promising considering that many factors remain missing from the model such as weather and personnel changes.