This Jupyter notebook describes the
nflmodel Python package which can be used to predict the full probability distribution of NFL point-spread and point-total outcomes.
I describe the theory behind the algorithm at length in an arXiv pre-print, and you can also read about it on the MELO sphinx documentation page. The purpose of the blog post is not to rehash the theory behind the model, but rather to demonstrate how it can be used effectively in practice.
nflmodel package requires Python3 and is intended to be used through a command line interface. I've tested the package on Arch Linux and OSX, but not on Windows.
The model also requires an active internet connection to pull in the latest game data and schedule information. I'm working to enable offline capabilities, but this does not exist at the moment.
Navigate to the parent directory where you want to save the
nflmodel package source, then run the following from the command line to install.
!git clone https://github.com/morelandjs/nfl-model.git nflmodel !pip3 install --user nflmodel/.
Cloning into 'nflmodel'... remote: Enumerating objects: 254, done. remote: Counting objects: 100% (254/254), done. remote: Compressing objects: 100% (156/156), done. remote: Total 371 (delta 126), reused 194 (delta 76), pack-reused 117 Receiving objects: 100% (371/371), 569.42 KiB | 887.00 KiB/s, done. Resolving deltas: 100% (173/173), done. Processing ./nflmodel ... Building wheels for collected packages: nflmodel Building wheel for nflmodel (setup.py) ... done Created wheel for nflmodel: filename=nflmodel-0.1-py3-none-any.whl size=12063 sha256=1c2b182dbb429044bb3828816c6fdcbc5ce3ec26079987e65acd1c7c638213f4 Stored in directory: /tmp/pip-ephem-wheel-cache-hn01xv8m/wheels/95/4a/2a/7caa67ad61638e063d0ace52ffbafff26848f75a09d18008ce Successfully built nflmodel Installing collected packages: nflmodel Attempting uninstall: nflmodel Found existing installation: nflmodel 0.1 Uninstalling nflmodel-0.1: Successfully uninstalled nflmodel-0.1 Successfully installed nflmodel-0.1
After installing the
nflmodel package, you'll need to populate the database of NFL game data. Since this is presumably your first time running the package, it will download all available games dating back to the 2009 season.
[INFO][data] updating season 2009 week 1 [INFO][data] updating season 2009 week 2 [INFO][data] updating season 2009 week 3 [INFO][data] updating season 2009 week 4 [INFO][data] updating season 2009 week 5 ... [INFO][data] updating season 2019 week 13 [INFO][data] updating season 2019 week 14 [INFO][data] updating season 2019 week 15 [INFO][data] updating season 2019 week 16 [INFO][data] updating season 2019 week 17
Subsequent calls to
nflmodel update will incrementally refresh the database and pull in all new game data since the last update. If at any point your database becomes corrupted, you can rebuild it from scratch using the optional
Now that we've populated the database, let's inspect some of the game data contained within.
datetime season week ... team_away qb_away score_away 0 2009-09-10 20:30:00 2009 1 ... TEN K.Collins 10 1 2009-09-13 13:00:00 2009 1 ... MIA C.Pennington 7 2 2009-09-13 13:00:00 2009 1 ... KC B.Croyle 24 3 2009-09-13 13:00:00 2009 1 ... PHI D.McNabb 38 4 2009-09-13 13:00:00 2009 1 ... DEN K.Orton 12 ... ... ... ... ... ... ... ... 2810 2019-12-29 13:00:00 2019 17 ... PHI C.Wentz 34 2811 2019-12-29 13:00:00 2019 17 ... ATL M.Ryan 28 2812 2019-12-29 16:25:00 2019 17 ... OAK D.Carr 15 2813 2019-12-29 16:25:00 2019 17 ... ARI K.Murray 24 2814 2019-12-29 16:25:00 2019 17 ... SF J.Garoppolo 26 [2815 rows x 9 columns]
Each row corresponds to a single game, and each column is a certain attribute of that game. At the moment I populate the following attributes:
nflmodel games runner is printing a pandas dataframe, so it may hide columns in order to fit the width of your terminal window. If you want to see more column output, try enlarging your terminal window.
If you want to see more row output, use the
--tail commands. For example,
!nflmodel games --head 3
datetime season week ... team_away qb_away score_away 0 2009-09-10 20:30:00 2009 1 ... TEN K.Collins 10 1 2009-09-13 13:00:00 2009 1 ... MIA C.Pennington 7 2 2009-09-13 13:00:00 2009 1 ... KC B.Croyle 24 [3 rows x 9 columns]
The games dataframe is sorted from oldest to most recent, so
--head N returns the N oldest games in the database and
--tail N returns the N most recent games. If you want to print the entire dataset to standard out, just set
--tail to a very large number.
The model predictions depend on a handful of hyperparameter values that are unknown a priori. These hyperparameters are:
Home field advantage is accounted for naturally by the
MELO model constructor, so it is not included in the above list.
The hyperparameter values are calibrated by the following command (this will take several minutes or so, so grab a drink).
!nflmodel calibrate --steps 200
[INFO][model] calibrating spread hyperparameters [INFO][data] updating season 2019 week 17 [INFO][tpe] build_posterior_wrapper took 0.001400 seconds [INFO][tpe] TPE using 0 trials [INFO][tpe] build_posterior_wrapper took 0.001555 seconds [INFO][tpe] TPE using 1/1 trials with best loss 10.549234 [INFO][tpe] build_posterior_wrapper took 0.001593 seconds [INFO][tpe] TPE using 2/2 trials with best loss 10.409128 [INFO][tpe] build_posterior_wrapper took 0.001607 seconds [INFO][tpe] TPE using 3/3 trials with best loss 10.409128 [INFO][tpe] build_posterior_wrapper took 0.001581 seconds [INFO][tpe] TPE using 4/4 trials with best loss 10.379357 ... [INFO][tpe] build_posterior_wrapper took 0.001602 seconds [INFO][tpe] TPE using 195/195 trials with best loss 10.699190 [INFO][tpe] build_posterior_wrapper took 0.001602 seconds [INFO][tpe] TPE using 196/196 trials with best loss 10.699190 [INFO][tpe] build_posterior_wrapper took 0.001628 seconds [INFO][tpe] TPE using 197/197 trials with best loss 10.699190 [INFO][tpe] build_posterior_wrapper took 0.001667 seconds [INFO][tpe] TPE using 198/198 trials with best loss 10.699190 [INFO][tpe] build_posterior_wrapper took 0.001667 seconds [INFO][tpe] TPE using 199/199 trials with best loss 10.699190 [INFO][model] caching total model to /home/morelandjs/.local/share/nflmodel/total.pkl
Here the argument
--steps (default value 100) specifies the number of calibration steps used by hyperopt to optimize the hyperparameter values. I've generally found that around 200 calibration steps is sufficient, but the convergence is already very good after only 100 steps.
MELO model is actually very fast to condition once the hyperparameter values are known, so default behavior is to condition the model every time it is queried from the CLI. This ensures that the model predictions are always up to date.
Once calibrated, the
nflmodel package can forecast various metrics for a given NFL season and week. For instance, the command
!nflmodel forecast 2019 10
[INFO][nflmodel] Forecast for season 2019 week 10 favorite underdog win prob spread total date 2019-11-10 @NO ATL 0.91 -10.0 47.9 2019-11-10 BAL @CIN 0.90 -8.9 45.9 2019-11-10 @IND MIA 0.74 -5.9 48.4 2019-11-11 @SF SEA 0.65 -4.2 49.7 2019-11-10 @DAL MIN 0.54 -2.9 45.1 2019-11-10 KC @TEN 0.59 -2.9 49.0 2019-11-10 BUF @CLE 0.57 -2.8 42.7 2019-11-10 @GB CAR 0.54 -1.9 54.1 2019-11-10 @CHI DET 0.61 -1.8 44.8 2019-11-07 @OAK LAC 0.53 -1.7 43.6 2019-11-10 @NYJ NYG 0.53 -1.3 47.5 2019-11-10 ARI @TB 0.52 -0.5 51.1 2019-11-10 LA @PIT 0.50 -0.1 46.1 *win probability and spread are for the favored team
will generate predictions for the 10th week of the 2019 NFL season, using all available game data prior to the start of that week. If you do not provide the year and week, the code will try to infer the upcoming week based on the current date and known NFL season schedule.
The output is structured relative to the favorite team in each matchup, with home teams indicated by an '@' symbol. For example, the forecast output above says that NO is playing at home versus ATL, where they have a 91% win probability, are favored by 10 points, and are predicted to combine for 47.9 total points. The games are also sorted so that the most lopsided matchups appear first and most even matchups appear last.
The model will also provide team rankings at a given moment in time. For example, suppose I want to rank every team at precisely
datetime = 2019-11-11T20:23:06. I can issue the command
!nflmodel rank --datetime "2019-11-11T20:23:06"
[INFO][nflmodel] Rankings as of 2019-11-11T20:23:06 win prob spread total rank 1 BAL 0.76 │ NE -10.1 │ KC 50.4 2 NE 0.76 │ BAL -7.4 │ TB 50.3 3 SEA 0.74 │ SF -5.8 │ NYG 48.4 4 NO 0.71 │ MIN -5.7 │ SEA 48.2 5 SF 0.71 │ LA -4.3 │ BAL 48.0 6 MIN 0.68 │ HOU -4.2 │ CAR 47.8 7 HOU 0.68 │ DAL -4.0 │ ATL 46.9 8 GB 0.66 │ NO -4.0 │ ARI 46.9 9 PIT 0.64 │ SEA -3.1 │ OAK 46.8 10 PHI 0.60 │ PIT -2.9 │ PHI 46.8 11 CAR 0.60 │ KC -2.8 │ GB 46.4 12 LA 0.60 │ PHI -2.7 │ DET 46.4 13 KC 0.57 │ GB -2.5 │ LA 46.1 14 DAL 0.55 │ CAR -2.3 │ NO 46.1 15 CHI 0.53 │ LAC -1.9 │ HOU 46.1 16 OAK 0.53 │ CHI -1.9 │ SF 46.0 17 TEN 0.52 │ BUF -0.5 │ CIN 45.8 18 BUF 0.50 │ JAX 0.0 │ CLE 45.4 19 JAX 0.50 │ DEN 0.1 │ DAL 45.4 20 LAC 0.48 │ IND 1.0 │ NYJ 45.3 21 DEN 0.47 │ TEN 1.2 │ IND 45.1 22 CLE 0.46 │ CLE 1.3 │ MIA 45.1 23 ATL 0.43 │ ATL 1.3 │ PIT 45.0 24 IND 0.41 │ DET 1.5 │ NE 44.3 25 DET 0.41 │ TB 1.8 │ MIN 44.3 26 TB 0.39 │ OAK 3.1 │ LAC 43.7 27 ARI 0.38 │ ARI 3.4 │ TEN 43.6 28 MIA 0.37 │ NYG 4.7 │ JAX 43.4 29 NYJ 0.33 │ WAS 4.9 │ WAS 42.8 30 WAS 0.33 │ CIN 5.1 │ BUF 42.7 31 CIN 0.30 │ NYJ 5.6 │ DEN 42.2 32 NYG 0.28 │ MIA 6.1 │ CHI 42.0 *expected performance against league average opponent on a neutral field
and see how the model thinks the teams should be ranked at that moment in time, according to their predicted performance against a league average opponent on a neutral field. The far left column is each team's predicted win probability, the middle column is its predicted Vegas point spread, and the far right column is its predicted point total.
The table above, for instance, tells me that BAL is most likely to win a generic matchup, while NE is most likely to blow out their opponent, and KC is the most likely to find itself in a shoot out.
One attractive feature of the
MELO base estimator is that it predicts full probability distributions. This enables the model to do all sorts of cool things like predict interquartile ranges for the point spread or draw samples of a matchup's point total distribution.
Most notably, it means that the model can estimate the probability that the point spread (or point total) falls above or below a given line. This means that it can evaluate the profitability of various bets point spread and point total lines.
This capability is accessed using the
nflmodel predict entry point. For example, suppose you want to analyze the game
2019-12-01 CLE at MIA with the following betting profile
|CLE||-9.5 (-110)||O 46.5 (-105)|
|MIA||+9.5 (-110)||U 46.5 (-115)|
This is accomplished by calling the
nfl predict runner with the following arguments.
!nflmodel predict "2019-12-01" CLE MIA --spread -110 -110 9.5 --total -105 -115 46.5
[INFO][nflmodel] 2019-12-01 CLE at MIA away home team CLE MIA win prob 67% 33% spread -12.4 12.4 total 44.9 44.9 score 29 16 spread cover 58% 42% spread return 12% -22% over under total cover 47% 53% total return -8% -1% *actual return rate lower than predicted
There's a lot of information here, so let's unpack what it means. First, the model believes CLE is a heavy favorite on the road. CLE's predicted win probability is 67%, their predicted point spread is -12.4 points, and their predicted point total is 44.9 points (according to the model). This would correspond to a characteristic final score of 29-16 CLE over MIA with fairly large uncertainties.
The model is also providing input on the Vegas point spread and point total lines. It expects CLE to cover their point spread line 58% of the time which is good for a 12% ROI, accounting for the house cut. Conversely, betting on MIA is expected to net a loss of 22% on average.
The over/under metrics are reported in a similar fashion. The model thinks there's a 53% chance the total score goes under the published Vegas line which results in a -1% loss on average. Similarly, taking the over is expected to net an 8% loss.
While it has not been readily apparent up to this point, the model accounts for changes at the quarterback position. It does this by tracking ratings at both the team and quarterback level.
For example, suppose Tom Brady were the only quarterback that ever played for the Patriots. Then there would be two associated ratings, one for T.Brady and one for NE, which are effectively identical. If however, Tom Brady left NE and went to play for SF for the last two years of his career, his rating would diverge from NE's rating and begin to track the performance of SF.
I take the weighted average of QB level and team level ratings when generating the effective rating for each upcoming game. In this way, I mix together the historical performance of the team with its QB. The weighted average is controlled by the
qb_weight hyperparameter which is fixed when calibrating the model.
The model has no direct knowledge of QB injuries, so you'll have to explicitly tell the model to generate predictions with a different quarterback if that's what you intend to do. At the moment, the only runner that can accommodate QB injuries is the
nflmodel predict runner.
Suppose, for example, you want to see how CLE would perform on the road against PHI on 2019-12-01 if Carson Wentz went out with an injury practicing before the game. First let's see how the two teams would matchup if Wentz never got hurt.
!nflmodel predict "2019-12-01" CLE-B.Mayfield PHI-C.Wentz
[INFO][nflmodel] 2019-12-01 CLE-B.Mayfield at PHI-C.Wentz away home team CLE-B.Mayfield PHI-C.Wentz win prob 44% 56% spread 4.5 -4.5 total 44.9 44.9 score 20 25 *actual return rate lower than predicted
Notice that the predictions are exactly the same if I omit the quarterback suffixes.
!nflmodel predict "2019-12-01" CLE PHI
[INFO][nflmodel] 2019-12-01 CLE at PHI away home team CLE PHI win prob 44% 56% spread 4.5 -4.5 total 44.9 44.9 score 20 25 *actual return rate lower than predicted
This is because both Baker Mayfield and Carson Wentz played in the game preceding the specified date, so the model assumes they are the starting QBs for upcoming games by default.
Suppose now, that Wentz got hurt. We can specify his backup QB Josh McCown to see how that would affect the model predictions.
!nflmodel predict "2019-12-01" CLE PHI-J.McCown
[INFO][nflmodel] 2019-12-01 CLE at PHI-J.McCown away home team CLE PHI-J.McCown win prob 51% 49% spread -1.8 1.8 total 44.8 44.8 score 23 22 *actual return rate lower than predicted
This creates roughly a 7 point swing in the point spread and CLE is now the favorite. In fairness to Josh McCown, the magnitude of this point shift is not purely a statement about the better QB. It also accounts for the fact that PHI has not been game planning for McCown, and the offense is not built around him.
nflmodel package includes a command line runner to validate the predictions of the model. If the model predictions are statistically robust, then the distribution of standardized residuals will be unit normal and the distribution of residual quantiles will be uniform.
This is a very powerful test of the model's veracity, but it does not necessarily mean the model is accurate. Rather it tests whether the model is correctly reporting its own uncertainties. To quantify the model's accuracy,
nflmodel validate also reports the models mean absolute prediction error for seasons 2011 through present.
[INFO][validate] spread residual mean: 0.10 [INFO][validate] spread residual mean absolute error: 10.37 [INFO][validate] total residual mean: 0.46 [INFO][validate] total residual mean absolute error: 10.70
This will produce two figures,
validate_total.png in the current working directory. For example, the point spread validation figure is shown below.
The model's point spreads have a mean absolute error of 10.37 points and its point totals have a mean absolute error of 10.70 points. The table below compares the model's point spread mean absolute error to Vegas for the specific season range 2011–2019.
|MODEL||10.37 pts||10.70 pts|
|VEGAS||10.23 pts||10.49 pts|
The model is less accurate than Vegas, but these numbers are still promising considering that many factors remain missing from the model such as weather and personnel changes.
I've also bundled a small script
nflmodel/tutorial/simulate-bets.py inside the tutorial directory which can simulate the performance of the model as a betting tool on historical games. Hereafter, I restrict my attention to point spreads since I'm currently neglecting many factors which are specifically important to point total values like stadium type and weather.
The general idea is as follows. I backtest the model and estimate the probability that the home team and away team each cover their Vegas spread (scraped from covers.com) at the moment just before kickoff. If the model believes that either team will cover their spread with a likelihood greater than X, where X is a fixed decision threshold, then I place a simulated bet on that team.
When the threshold X=50%, the simulation places bets on every game, and when the threshold X ≫ 50%, the simulation is far more selective with the games it chooses to bet on. If I set the decision threshold X > 90% then the model cannot find any games with sufficiently high confidence to bet on.
In principle, one expects the model to get more bets correct when X is large because it is more confident in those bets. The goal here is to see if there is a threshold X which is sufficiently large to yield a positive ROI.
Technically, I need more information to compute the model's ROI than just the historical spread. I also need the vigorish or "juice" on each spread which is the cut taken by the house in order to place a bet. Typically this ranges anywhere from -100 to -120 for spread bets, which means you might need to risk up to 120 dollars in order to win 100 dollars. More extreme vigs exist but they are uncommon.
Unfortunately, I do not have the historical vigs for these lines, precluding a true ROI simulation. However, you can rest assured that the ROI will be strongly negative if the model is not getting more games right than wrong. So for now, let's just see if I can demonstrate that the model is better than 50-50 with statistical significance.
Statistical significance here is key. As I increase the betting threshold, I reduce my validation statistics because the number of qualifying games drops. For large values of the threshold X, the model may only find a dozen or so qualifying games to bet on. This means the results of the simulation will be noisy and it will be possible to be duped by statistical fluctuations (we've all seen someone flip four heads in a row).
To calculate statistical significance, I compute the 90% interquartile range for the null model, i.e. the range of reasonable outcomes for independent random wins sampled with 50% probability. In other words, I compare my model's success rate to what one would expect from random chance.
The results of this calculation are shown below for a decision threshold X=0.8.
!python3 simulate-bets.py 0.8
[INFO][simulate-bets] 15 won, 7 lost [INFO][simulate-bets] 0.68% correct model; 0.36-0.64% random chance [INFO][simulate-bets] Vegas mean abs error: 11.73 pts [INFO][simulate-bets] Model mean abs error: 11.14 pts
Here we see that the model is performing at the ~90% quantile level, i.e. not something you'd use to bet a ton of money, but I think impressive nonetheless.
Unfortunately, I have not prepped the model to work on the post-season yet. There's nothing that precludes applying the model to the post-season, it just involves some more work, and I haven't had time to do it yet.
In any event, I think this tutorial shows that the Vegas lines are quite accurate, and it is non-trivial to build a model that beats them. The small number of available games makes NFL modeling a uniquely interesting problem, and I've learned quite a bit on my own quest to build a better model.
Please feel free to contact me at email@example.com with questions and comments.