Usage

melo is a computer model to generate rankings and predictions from paired comparison time series data. It has obvious applications to sports, but the framework is general and can be used for numerous other purposes including consumer surveys and asset pricing.

Overview

This is a brief overview of the melo Python package. See Theory for an explanation of the underlying math.

1. Initialization

First, import the Melo class.

from melo import Melo

Next, create a Melo class object and specify its constructor arguments.

melo_instance = Melo(
   k, lines=lines, sigma=sigma, regress=regress,
   regress_unit=regress_unit, dist=dist, commutes=commutes
)

Parameters

  • k (float) – At bare minimum, you’ll need to specify the rating update factor k which is the first and only positional argument. The k factor controls the magnitude of each rating update, with larger k values making the model more responsive to each comparison outcome. Its value should be chosen by minimizing the model’s predictive error.

  • lines (array_like of float, optional) – The lines array specifies the sequence of binary comparison thresholds. The default, lines=0, corresponds to a classical Elo model where the comparison outcome is True if the value is greater than 0 and False otherwise. In general, you’ll want to create an array of lines spanning the range of possible outcomes (see Example).

  • sigma (float, optional) – This parameter adds some uncertainty to each observed value when training the model. When sigma=0 (default), the comparison is True if the value exceeds a given line and False otherwise. For sigma > 0, it gives the comparison operator (step function) a soft edge of width sigma. Small sigma values generally help to smooth and regulate the model predictions.

  • regress (function, optional) – This argument provides an entry point for implementing rating decay. It must be a scalar function of one input variable (elapsed time) which returns a single number (fractional regression to the mean). The default, regress = lambda time: 0, applies no regression to the mean as a function of elapsed time.

  • regress_unit (string, optional) – Sets the elapsed time units of the regress function. Options are: year (default), month, week, day, hour, minute, second, millisecond, microsecond, nanosecond, picosecond, femtosecond, and attosecond. For example, suppose regress_unit=’year’ and regress = lambda time: 0.2 if time > 1 else 0. This means the model will regress each rating to the mean by 20% if the elapsed time since the last update is greater than one year.

  • dist (string, optional) – Specifies the type of distribution function used to convert rating differences into probabilities. Options are normal (default) and logistic. Switching distribution types will generally require somewhat different hyperparameters.

  • commutes (bool, optional) – This parameter describes the expected behavior of the estimated values under label interchange. If commutes=False, it is assumed that the comparisons anti-commute under label interchange (default behavior), and if commutes=True, it is assumed they commute. For example, point totals require commutes=True and point spreads require commutes=False.

2. Training data

Each melo training input is a tuple of the form (time, label1, label2) and each training output is a single number value. This training data is passed to the model as four array_like objects of equal length:

  • times is an array_like object of type np.datetime64 (or compatible string). It specifies the time at which the comparison was made.

  • labels1 and labels2 are array_like objects of type string. They specify the first and second label names of the entities involved in the comparison.

  • values is an array_like object of type float. It specifies the numeric value of the comparison, e.g. the value of the point spread or point total.

Warning

It is assumed that the elements of each array match up, i.e. the n-th element of each array should correspond to the same comparison. It is not necessary that the comparisons are time ordered.

For example, the data used to train the model might look like the following:

times = ['2009-09-10', '2009-09-13', '2009-09-13']
labels1 = ['PIT', 'ATL', 'BAL']
labels2 = ['TEN', 'MIA', 'KC']
values = [3, 12, 14]

3. Model calibration

The model is calibrated by calling the fit function on the training data.

melo_instance.fit(times, labels1, labels2, values, biases=0)

Optionally, when training the model you can specify biases (float or array_like of floats). These are numbers which add to (or subtract from) the rating difference of each comparison, i.e.

\[\Delta R = R_\text{label1} - R_\text{label2} + \text{bias}.\]

These factors can be used to account for transient advantages and disadvantages such as weather and temporary injuries. Positive bias numbers increase the expected value of the comparison, and negative values decrease it. If biases is a single number, the bias factor is assumed to be constant for all comparisons. Otherwise, there must be a bias factor for every training input.

Note

The model automatically accounts for global spread bias such as that associated with home field advantage. To take advantage of this functionality, the label entries should be ordered such that the bias is alligned with the first (or second) label.

4. Making predictions

Once the model is fit to the training data, there are a number of different functions which can be called to generate predictions for new comparisons at arbitrary points in time.

At the most basic level, the model predicts the survival function probability distribution \(P(\text{value} > \text{line})\) as a function of the line. This distribution is generated by the function call

melo_instance.probability(times, labels1, labels2, biases=biases, lines=lines)

where times, labels1, labels2, and biases are the prediction inputs, and lines is the array of lines where the probability is to be estimated.

However, this function call is just the tip of the iceberg. Given this information, the model can predict many other interesting quantities such as the mean and median comparison values

melo_instance.mean(times, labels1, labels2, biases=biases)

melo_instance.median(times, labels1, labels2, biases=biases)

…arbitrary percentiles (or quantiles) of the distribution

melo_instance.percentile(times, labels1, labels2, biases=biases, p=[10, 50, 90])

and it can even draw samples from the estimated survival function probability distribution

melo_instance.sample(times, labels1, labels2, biases=biases, size=100)

Perhaps one of the most useful applications of the model is using its mean and median predictions to create rankings. This is aided by the rank function

melo_instance.rank(time, statistic='mean')

which ranks the labels at the specified time according to their expected performance against an average opponent, i.e. an opponent with an average rating.

Reference

Main class

class melo.Melo(k, lines=0, sigma=0, regress=None, regress_unit='year', dist='normal', commutes=False)

Margin-dependent Elo (MELO) class constructor

Parameters
  • k (float) – Prefactor multiplying each rating update \(\Delta R = k\, (P_\text{obs} - P_\text{pred})\).

  • lines (array_like of float, optional) –

    Handicap line or sequence of lines used to construct the vector of binary comparisons

    \(\mathcal{C}(\text{label1}, \text{label2}, \text{value}) \equiv (\text{value} > \text{lines})\).

    The default setting, lines=0, deploys the traditional Elo rating system.

  • sigma (float, optional) –

    Smearing parameter which gives the observed probability,

    \(P_\mathrm{obs} = 1\) if value > line else 0,

    a soft edge of width sigma. A small sigma value helps to regulate and smooth the predicted probability distributions.

  • regress (function, optional) – Univariate scalar function regress = f(time) which describes how ratings should regress to the mean as a function of elapsed time. When this function value is zero, the rating is unaltered, and when it is unity, the rating is fully regressed to the mean. The time units are set by the parameter regress_units (see below).

  • regress_unit (string, optional) – Unit of elapsed time for the regress function. Options are: year (default), month, week, day, hour, minute, second, millisecond, microsecond, nanosecond, picosecond, femtosecond, and attosecond.

  • dist (string, optional) –

    Probability distribution, “normal” (default) or “logistic”, which converts rating differences into probabilities:

    \(P(\text{value} > \text{line}) = \text{dist.cdf}(\Delta R(\text{line}))\)

  • commutes (bool, optional) –

    If this is set to True, the comparison values are assumed to be symmetric under label interchange. Otherwise the values are assumed to be anti-symmetric under label interchange.

    For example, point-spreads should use commutes=False and point-totals commutes=True.

Training function

melo.Melo.fit(self, times, labels1, labels2, values, biases=0)

This function is used to calibrate the model on the training inputs. It computes and records each label’s Elo ratings at the line(s) given in the class constructor. The function returns the predictions’ total cross entropy loss.

Parameters
  • times (array_like of np.datetime64) – List of datetimes.

  • labels1 (array_like of string) – List of first entity labels.

  • labels2 (array_like of string) – List of second entity labels.

  • biases (array_like of float, optional) – Single bias number or list of bias numbers which match the comparison inputs. Default is 0, in which case no bias is used.

Returns

loss – Cross entropy loss for the model predictions.

Return type

float

Prediction functions

melo.Melo.probability(self, times, labels1, labels2, biases=0, lines=0)

Predict the survival function probability .. math:: P(text{value} > text{lines}) for the specified comparison(s).

Parameters
  • times (array_like of np.datetime64) – List of datetimes.

  • labels1 (array_like of string) – List of first entity labels.

  • labels2 (array_like of string) – List of second entity labels.

  • biases (array_like of float, optional) – Single bias number or list of bias numbers which match the comparison inputs. Default is 0, in which case no bias is used.

  • lines (array_like of float, optional) – Line or sequence of lines used to estimate the comparison distribution. Default is lines=0, in which case the model predicts the probability that value > 0.

Returns

probability – If a single comparison is given and a single line is specified, this function returns a scalar. If multiple comparisons or multiple lines are given, this function returns an ndarray.

Return type

scalar or array_like of float

melo.Melo.percentile(self, times, labels1, labels2, biases=0, p=50)

Predict the p-th percentile for the specified comparison(s).

Parameters
  • times (array_like of np.datetime64) – List of datetimes.

  • labels1 (array_like of string) – List of first entity labels.

  • labels2 (array_like of string) – List of second entity labels.

  • biases (array_like of float, optional) – Single bias number or list of bias numbers which match the comparison inputs. Default is 0, in which case no bias is used.

  • p (array_like of float, optional) – Percentile or sequence of percentiles to compute, which must be between 0 and 100 inclusive. Default is p=50, which computes the median.

Returns

percentile – If one comparison is given, and p is a single percentile, then the result is a scalar. If multiple comparisons or multiple percentiles are given, the result is an ndarray.

Return type

scalar or ndarray of float

melo.Melo.quantile(self, times, labels1, labels2, biases=0, q=0.5)

Predict the q-th quantile for the specified comparison(s).

Parameters
  • times (array_like of np.datetime64) – List of datetimes.

  • labels1 (array_like of string) – List of first entity labels.

  • labels2 (array_like of string) – List of second entity labels.

  • biases (array_like of float, optional) – Single bias number or list of bias numbers which match the comparison inputs. Default is 0, in which case no bias is used.

  • q (array_like of float, optional) – Quantile or sequence of quantiles to compute, which must be between 0 and 1 inclusive. Default is q=0.5, which computes the median.

Returns

quantile – If one comparison is given, and q is a single quantiles, then the result is a scalar. If multiple comparisons or multiple quantiles are given, the result is an ndarray.

Return type

scalar or ndarray of float

melo.Melo.mean(self, times, labels1, labels2, biases=0)

Predict the mean for the specified comparison(s).

Parameters
  • times (array_like of np.datetime64) – List of datetimes.

  • labels1 (array_like of string) – List of first entity labels.

  • labels2 (array_like of string) – List of second entity labels.

  • biases (array_like of float, optional) – Single bias number or list of bias numbers which match the comparison inputs. Default is 0, in which case no bias is used.

Returns

mean – If one comparison is given, then the result is a scalar. If multiple comparisons are given, then the result is an ndarray.

Return type

scalar or ndarray of float

melo.Melo.median(self, times, labels1, labels2, biases=0)

Predict the median for the specified comparison(s).

Parameters
  • times (array_like of np.datetime64) – List of datetimes.

  • labels1 (array_like of string) – List of first entity labels.

  • labels2 (array_like of string) – List of second entity labels.

  • biases (array_like of float, optional) – Single bias number or list of bias numbers which match the comparison inputs. Default is 0, in which case no bias is used.

Returns

median – If one comparison is given, then the result is a scalar. If multiple comparisons are given, then the result is an ndarray.

Return type

scalar or ndarray of float

melo.Melo.residuals(self, statistic='mean', standardize=False)

Prediction residuals (or Z-scores if standardize is True) for all comparisons in the training data. The Z-scores should sample a unit normal distribution.

Parameters
  • statistic (string, optional) – Type of prediction statistic. Options are ‘mean’ (default) or ‘median’.

  • standardize (bool, optional) – If standardize is True, divides prediction residuals by their one-sigma prediction uncertainty. Default value is False.

Returns

residuals – The residuals for each comparison in the training data. The residuals are time ordered, and may not appear in the same order as originally given.

Return type

ndarray of float

melo.Melo.quantiles(self)

Prediction quantiles for all comparisons in the training data. The quantiles should sample a uniform distribution from 0 to 1.

Returns

quantiles – The quantiles for each comparison in the training data. The quantiles are time ordered, and may not appear in the same order as originally given.

Return type

ndarray of float

melo.Melo.rank(self, time, statistic='mean')

Ranks labels by comparing each label to the average label using the specified summary statistic.

Parameters
  • time (np.datetime64) – The time at which the ranking should be computed.

  • statistic (string, optional) – Determines the binary comparison ranking statistic. Options are ‘mean’ (default), ‘median’, or ‘win’.

Returns

label rankings – Returns a rank sorted list of (label, rank) pairs, where rank is the comparison value of the specified summary statistic.

Return type

list of tuples

melo.Melo.sample(self, times, labels1, labels2, biases=0, size=100)

Draw random samples from the predicted comparison probability distribution.

Parameters
  • times (array_like of np.datetime64) – List of datetimes.

  • labels1 (array_like of string) – List of first entity labels.

  • labels2 (array_like of string) – List of second entity labels.

  • biases (array_like of float, optional) – Single bias number or list of bias numbers which match the comparison inputs. Default is 0, in which case no bias is used.

  • size (int, optional) – Number of samples to be drawn. Default is 1, in which case a single value is returned.