Forecasting NFL point spreads and point totals using the margin-dependent Elo model

J. Scott Moreland | Philadelphia, PA | January 11th, 2020

A brief tutorial of an NFL implementation of the margin-dependent Elo (MELO) model.

Intro

This Jupyter notebook describes the nflmodel Python package which can be used to predict the full probability distribution of NFL point-spread and point-total outcomes.

The model is inspired by the Elo based sports analytics work at fivethirtyeight.com and makes use of a machine learning algorithm that I developed called the margin-dependent Elo (MELO) model.

I describe the theory behind the algorithm at length in an arXiv pre-print, and you can also read about it on the MELO sphinx documentation page. The purpose of the blog post is not to rehash the theory behind the model, but rather to demonstrate how it can be used effectively in practice.

Requirements

The nflmodel package requires Python3 and is intended to be used through a command line interface. I've tested the package on Arch Linux and OSX, but not on Windows.

The model also requires an active internet connection to pull in the latest game data and schedule information. I'm working to enable offline capabilities, but this does not exist at the moment.

Installation

Navigate to the parent directory where you want to save the nflmodel package source, then run the following from the command line to install.

In [27]:
!git clone https://github.com/morelandjs/nfl-model.git nflmodel
!pip3 install --user nflmodel/.
Cloning into 'nflmodel'...
remote: Enumerating objects: 199, done.
remote: Counting objects: 100% (199/199), done.
remote: Compressing objects: 100% (125/125), done.
remote: Total 316 (delta 101), reused 146 (delta 59), pack-reused 117
...
Successfully built nflmodel
Installing collected packages: nflmodel
Successfully installed nflmodel-0.1

After installing the nflmodel package, you'll need to populate the database of NFL game data. Since this is presumably your first time running the package, it will download all available games dating back to the 2009 season.

In [28]:
!nflmodel update
[INFO][data] updating season 2009 week 1
[INFO][data] updating season 2009 week 2
...
[INFO][data] updating season 2019 week 17

Subsequent calls to nflmodel update will incrementally refresh the database and pull in all new game data since the last update. If at any point your database becomes corrupted, you can rebuild it from scratch using the optional --rebuild flag.

Now that we've populated the database, let's inspect some of the game data contained within.

In [29]:
!nflmodel games
                datetime  season  week  ... team_away       qb_away  score_away
0    2009-09-10 20:30:00    2009     1  ...       TEN     K.Collins          10
1    2009-09-13 13:00:00    2009     1  ...       MIA  C.Pennington           7
2    2009-09-13 13:00:00    2009     1  ...        KC      B.Croyle          24
3    2009-09-13 13:00:00    2009     1  ...       PHI      D.McNabb          38
4    2009-09-13 13:00:00    2009     1  ...       DEN       K.Orton          12
...                  ...     ...   ...  ...       ...           ...         ...
2811 2019-12-29 13:00:00    2019    17  ...       PHI       C.Wentz          34
2812 2019-12-29 13:00:00    2019    17  ...       ATL        M.Ryan          28
2813 2019-12-29 16:25:00    2019    17  ...       OAK        D.Carr          15
2814 2019-12-29 16:25:00    2019    17  ...       ARI      K.Murray          24
2815 2019-12-29 16:25:00    2019    17  ...        SF   J.Garoppolo          26

[2816 rows x 9 columns]

Each row corresponds to a single game, and each column is a certain attribute of that game. At the moment I populate the following attributes:

  • datetime - (datetime64) - game start time
  • season (int) - year in which the season started
  • week (int) - nfl season week 1–17
  • team_home (string) - home team city abbreviation
  • qb_home (string) - home team quarterback name, first initial dot surname
  • score_home (int) - home team points scored
  • team_away (string) - away team city abbreviation
  • qb_away (string) - away team quarterback name, first initial dot surname
  • score_away (int) - away team points scored

Note, the nflmodel games runner is printing a pandas dataframe, so it may hide columns in order to fit the width of your terminal window. If you want to see more column output, try enlarging your terminal window.

If you want to see more row output, use the --head and --tail commands. For example,

In [30]:
!nflmodel games --head 3
             datetime  season  week  ... team_away       qb_away  score_away
0 2009-09-10 20:30:00    2009     1  ...       TEN     K.Collins          10
1 2009-09-13 13:00:00    2009     1  ...       MIA  C.Pennington           7
2 2009-09-13 13:00:00    2009     1  ...        KC      B.Croyle          24

[3 rows x 9 columns]

The games dataframe is sorted from oldest to most recent, so --head N returns the N oldest games in the database and --tail N returns the N most recent games. If you want to print the entire dataset to standard out, just set --head or --tail to a very large number.

Calibration

The model predictions depend on a handful of hyperparameter values that are unknown a priori. These hyperparameters are:

  • kfactor (float) - prefactor multiplying the Elo rating update; a larger kfactor makes the ratings more reponsive to game outcomes
  • regress_coeff (float) - fraction to regress each rating to the mean after 3 months inactivity
  • rest_bonus (float) - prefactor multiplying each matchup's rest difference (or sum)
  • exp_bonus (float) - prefactor multiplying each matchup's quarterback experience difference (or sum)
  • weight_qb (float) - coeficient used to blend team ratings and quarterback ratings.

Home field advantage is accounted for naturally by the MELO model constructor, so it is not included in the above list.

The hyperparameter values are calibrated by the following command (this will take several minutes or so, so grab a drink).

In [31]:
!nflmodel calibrate --steps 200
[INFO][model] calibrating spread hyperparameters
[INFO][data] updating season 2019 week 17
[INFO][tpe] tpe_transform took 0.001760 seconds
[INFO][tpe] TPE using 0 trials
...
[INFO][tpe] tpe_transform took 0.001731 seconds
[INFO][tpe] TPE using 199/199 trials with best loss 10.373855
[INFO][model] caching spread model to /home/morelandjs/.local/share/nflmodel/spread.pkl
[INFO][model] calibrating total hyperparameters
[INFO][data] updating season 2019 week 17
[INFO][tpe] tpe_transform took 0.001551 seconds
[INFO][tpe] TPE using 0 trials
...
[INFO][tpe] tpe_transform took 0.001817 seconds
[INFO][tpe] TPE using 199/199 trials with best loss 10.694746
[INFO][model] caching total model to /home/morelandjs/.local/share/nflmodel/total.pkl

Here the argument --steps (default value 100) specifies the number of calibration steps used by hyperopt to optimize the hyperparameter values. I've generally found that around 200 calibration steps is sufficient, but the convergence is already very good after only 100 steps.

Note, the MELO model is actually very fast to train once the hyperparameter values are known, so default behavior is to retrain the model every time it is queried from the CLI. This ensures that the model predictions are always up to date.

Weekly forecasts

Once calibrated, the nflmodel package can forecast various metrics for a given NFL season and week. For instance, the command

In [32]:
!nflmodel forecast 2019 10
[INFO][nflmodel] Forecast for season 2019 week 10

           favorite underdog  win prob  spread  total
date                                                 
2019-11-10      @NO      ATL      0.91    -9.9   47.6
2019-11-10      BAL     @CIN      0.91    -9.2   45.7
2019-11-10     @IND      MIA      0.71    -5.2   47.7
2019-11-11      @SF      SEA      0.67    -4.4   48.9
2019-11-10     @DAL      MIN      0.54    -3.0   45.0
2019-11-10      BUF     @CLE      0.57    -2.8   42.0
2019-11-10       KC     @TEN      0.58    -2.7   48.6
2019-11-10     @CHI      DET      0.61    -1.8   44.7
2019-11-07     @OAK      LAC      0.54    -1.7   43.2
2019-11-10      @GB      CAR      0.53    -1.4   53.6
2019-11-10     @NYJ      NYG      0.53    -1.3   46.9
2019-11-10      ARI      @TB      0.53    -0.6   50.9
2019-11-10       LA     @PIT      0.51    -0.3   44.9 

*win probability and spread are for the favored team

will generate predictions for the 10th week of the 2019 NFL season, using all available game data prior to the start of that week. If you do not provide the year and week, the code will try to infer the upcoming week based on the current date and known NFL season schedule.

The output is structured relative to the favorite team in each matchup, with home teams indicated by an '@' symbol. For example, the forecast output above says that NO is playing at home versus ATL, where they have a 91% win probability, are favored by 9.6 points, and are predicted to combine for 47.4 total points. The games are also sorted so that the most lopsided matchups appear first and most even matchups appear last.

Team rankings

The model will also provide team rankings at a given moment in time. For example, suppose I want to rank every team at precisely datetime = 2019-11-11T20:23:06. I can issue the command

In [34]:
!nflmodel rank --datetime "2019-11-11T20:23:06"
[INFO][nflmodel] Rankings as of 2019-11-11T20:23:06

       win prob        spread         total
rank                                       
1     BAL  0.78  │  NE  -10.5  │   TB  50.1
2      NE  0.77  │  BAL  -7.7  │   KC  50.1
3     SEA  0.75  │   SF  -6.3  │  NYG  48.3
4      SF  0.72  │  MIN  -5.8  │  BAL  48.0
5      NO  0.71  │  HOU  -4.5  │  SEA  47.9
6     HOU  0.69  │   LA  -4.5  │  CAR  47.6
7     MIN  0.68  │  DAL  -4.2  │  ARI  46.8
8      GB  0.67  │   NO  -4.1  │  ATL  46.7
9     PIT  0.64  │  SEA  -3.2  │  OAK  46.6
10    CAR  0.62  │  PIT  -3.0  │  PHI  46.5
11    PHI  0.61  │   KC  -2.8  │  DET  46.4
12     LA  0.60  │  PHI  -2.8  │   GB  46.0
13     KC  0.57  │   GB  -2.6  │  HOU  45.9
14    DAL  0.55  │  CAR  -2.5  │   SF  45.8
15    OAK  0.54  │  LAC  -2.0  │  CIN  45.6
16    CHI  0.54  │  CHI  -2.0  │   NO  45.5
17    TEN  0.53  │  BUF  -0.7  │   LA  45.4
18    JAX  0.51  │  JAX  -0.2  │  DAL  45.3
19    BUF  0.50  │  DEN  -0.2  │  CLE  45.2
20    DEN  0.48  │  CLE   1.1  │  NYJ  45.0
21    LAC  0.48  │  ATL   1.2  │  IND  45.0
22    CLE  0.47  │  IND   1.4  │  PIT  44.8
23    ATL  0.43  │  TEN   1.4  │  MIA  44.8
24    DET  0.41  │  DET   1.4  │  MIN  44.2
25     TB  0.39  │   TB   1.7  │   NE  44.0
26    ARI  0.38  │  OAK   3.0  │  LAC  43.4
27    IND  0.38  │  ARI   3.1  │  JAX  43.0
28    MIA  0.38  │  NYG   4.5  │  TEN  43.0
29    NYJ  0.33  │  WAS   4.7  │  WAS  42.3
30    WAS  0.33  │  CIN   4.8  │  BUF  42.2
31    CIN  0.31  │  NYJ   5.4  │  CHI  41.7
32    NYG  0.28  │  MIA   5.9  │  DEN  41.6 

*expected performance against league average
opponent on a neutral field

and see how the model thinks the teams should be ranked at that moment in time, according to their predicted performance against a league average opponent on a neutral field. The far left column is each team's predicted win probability, the middle column is its predicted Vegas point spread, and the far right column is its predicted point total.

The table above, for instance, tells me that BAL is most likely to win a generic matchup, while NE is most likely to blow out their opponent, and TB is the most likely to find itself in a shoot out.

Individual game predictions

One attractive feature of the MELO base estimator is that it predicts full probability distributions. This enables the model to do all sorts of cool things like predict interquartile ranges for the point spread or draw samples of a matchup's point total distribution.

Most notably, it means that the model can estimate the probability that the point spread (or point total) falls above or below a given line. This means that it can evaluate the profitability of various bets point spread and point total lines.

This capability is accessed using the nflmodel predict entry point. For example, suppose you want to analyze the game 2019-12-01 CLE at MIA with the following betting profile

SPREAD TOTAL
CLE -9.5 (-110) O 46.5 (-105)
MIA +9.5 (-110) U 46.5 (-115)

This is accomplished by calling the nfl predict runner with the following arguments.

In [35]:
!nflmodel predict "2019-12-01" CLE MIA --spread -110 -110 9.5 --total -105 -115 46.5
[INFO][nflmodel] 2019-12-01T00:00:00 CLE at MIA

                away   home
team             CLE    MIA
win prob         68%    32%
spread         -12.6   12.6
total           44.2   44.2
score             28     16
spread cover     59%    41%
spread return    14%   -24%
                           
                over  under
total cover      45%    55%
total return    -13%     4% 

*actual return rate lower than predicted

There's a lot of information here, so let's unpack what it means. First, the model believes CLE is a heavy favorite on the road. CLE's predicted win probability is 68%, their predicted point spread is -12.6 points, and their predicted point total is 44.2 points (according to the model). This would correspond to a characteristic final score of 28-16 CLE over MIA with fairly large uncertainties.

The model is also providing input on the Vegas point spread and point total lines. It expects CLE to cover their point spread line 59% of the time which is good for a 14% ROI, accounting for the house cut. Conversely, betting on MIA is expected to net a loss of 24% on average.

The over/under metrics are reported in a similar fashion. The model thinks there's a 55% chance the total score goes under the published Vegas line which is good for a 4% return on average. Similarly, taking the over is expected to net a 13% loss.

Quarterback changes

While it has not been readily apparent up to this point, the model accounts for changes at the quarterback position. It does this by tracking ratings at both the team and quarterback level.

For example, suppose Tom Brady were the only quarterback that ever played for the Patriots. Then there would be two associated ratings, one for T.Brady and one for NE, which are effectively identical. If however, Tom Brady left NE and went to play for SF for the last two years of his career, his rating would diverge from NE's rating and begin to track the performance of SF.

I take the weighted average of QB level and team level ratings when generating the effective rating for each upcoming game. In this way, I mix together the historical performance of the team with its QB. The weighted average is controlled by the qb_weight hyperparameter which is fixed when calibrating the model.

The model has no direct knowledge of QB injuries, so you'll have to explicitly tell the model to generate predictions with a different quarterback if that's what you intend to do. At the moment, the only runner that can accommodate QB injuries is the nflmodel predict runner.

Suppose, for example, you want to see how CLE would perform on the road against PHI on 2019-12-01 if Carson Wentz went out with an injury practicing before the game. First let's see how the two teams would matchup if Wentz never got hurt.

In [36]:
!nflmodel predict "2019-12-01" CLE-B.Mayfield PHI-C.Wentz
[INFO][nflmodel] 2019-12-01T00:00:00 CLE-B.Mayfield at PHI-C.Wentz

                    away         home
team      CLE-B.Mayfield  PHI-C.Wentz
win prob             45%          55%
spread               4.4         -4.4
total               44.3         44.3
score                 20           24 

*actual return rate lower than predicted

Notice that the predictions are exactly the same if I omit the quarterback suffixes.

In [37]:
!nflmodel predict "2019-12-01" CLE PHI
[INFO][nflmodel] 2019-12-01T00:00:00 CLE at PHI

          away  home
team       CLE   PHI
win prob   45%   55%
spread     4.4  -4.4
total     44.3  44.3
score       20    24 

*actual return rate lower than predicted

This is because both Baker Mayfield and Carson Wentz played in the game preceding the specified date, so the model assumes they are the starting QBs for upcoming games by default.

Suppose now, that Wentz got hurt. We can specify his backup QB Josh McCown to see how that would affect the model predictions.

In [38]:
!nflmodel predict "2019-12-01" CLE PHI-J.McCown
[INFO][nflmodel] 2019-12-01T00:00:00 CLE at PHI-J.McCown

          away          home
team       CLE  PHI-J.McCown
win prob   53%           47%
spread    -2.8           2.8
total     44.3          44.3
score       24            21 

*actual return rate lower than predicted

This creates roughly a 7 point swing in the point spread and CLE is now the favorite. In faireness to Josh McCown, the magnitude of this point shift is not purely a statement about the better QB. It also accounts for the fact that PHI has not been game planning for McCown, and the offense is not built around him.

Validation

The nflmodel package includes a command line runner to validate the predictions of the model. If the model predictions are statistically robust, then the distribution of standardized residuals will be unit normal and the distribution of residual quantiles will be uniform.

This is a very powerful test of the model's veracity, but it does not necessarily mean the model is accurate. Rather it tests whether the model is correctly reporting its own uncertainties. To quantify the model's accuracy, nflmodel validate also reports the models mean absolute prediction error for seasons 2011 through present.

In [39]:
!nflmodel validate
[INFO][validate] spread residual mean: 0.08
[INFO][validate] spread residual mean absolute error: 10.37
[INFO][validate] total residual mean: 0.26
[INFO][validate] total residual mean absolute error: 10.69

This will produce two figures, validate_spread.png and validate_total.png in the current working directory. For example, the point spread validation figure is shown below.

spread validation

The model's point spreads have a mean absolute error of 10.37 points and its point totals have a mean absolute error of 10.69 points. The table below compares the model's point spread mean absolute error to Vegas for the specific season range 2011–2018.

MAE SPREAD TOTAL
MODEL 10.35 pts 10.67 pts
VEGAS 10.22 pts 10.49 pts

The model is less accurate than Vegas, but these numbers are still promising considering that many factors remain missing from the model such as weather and personnel changes.

Betting simulation

In [ ]: