4. The Davidson Model

The Davidson Model

声明：本文为本人毕业研究报告《The Exploration of Pairwise Comparison in Football Application》中的部分内容摘录与整理，仅用于学习与交流。

Introduction

Roger R. Davidson [3] introduced a scaling constant $v$ , and based on Luce’s Choice Axiom [4] and the geometric mean, Davidson established a new model which unifies the model’s denominator. Although both the Rao-Kupper model [1] and Davidson model are extensions of the Bradley-Terry model [2], there are significant differences.

In the search for Maximum Likelihood Estimators (MLEs), the existence and uniqueness of solutions were demonstrated based on Ford’s Assumption [5].

Luce’s Choice Axiom

Luce’s Choice Axiom
Given a finite set of treatments ${1, 2, . . ., n}$ with associated worth $π_{i} \geq 0$ for each treatment $i$ and $\sum_{i = 1}^{n} π_{i} = 1$ , Luce’s Choice Axiom can be concisely formalized as:

For i, j \in {1, 2, . . ., t} and i \neq j, \frac{p (i ∣ i, j)}{p (j ∣ i, j)} = \frac{π_{i}}{π_{j}},

where $p (l ∣ i, j)$ is the probability of choosing treatment $l$ from the pair $i, j$ , with $l = i$ or $l = j$ , under the condition that $p (i ∣ i, j) \neq 0$ and $p (j ∣ i, j) \neq 1$ .

Non-empty Subset Preference

Non-empty Subset Preference (Assumption)
The model assumes that in every division of treatments into two groups, at least one treatment in one group is preferred over another in the opposite group at least once. It is critical for ensuring that the likelihood function behaves properly and that a global maximum exists.

Model Formulation 4

In the context of pairwise comparison among $n$ treatments, where each treatment $i$ ’s inherent preference (or stimulus strength), denoted by $π_{i}$ , satisfies the normalisation $\sum_{i = 1}^{t} π_{i} = 1$ with $π_{i} \geq 0$ for all $i$ .

For scenarios without a clear preference, $p (0 ∣ i, j)$ , is calculated as proportional to the geometric mean of individual preferences:

p (0 ∣ i, j) = v \sqrt{p (i ∣ i, j) \cdot p (j ∣ i, j)},

where $v \geq 0$ acts as a scaling constant. Based on Lemma equation (above), the adapted model is:

\begin{aligned} p (l ∣ i, j) & = \frac{π_{l}}{π_{i} + π_{j} + v \sqrt{π_{i} π_{j}}}, l = i, j, \\ p (0 ∣ i, j) & = \frac{v \sqrt{π_{i} π_{j}}}{π_{i} + π_{j} + v \sqrt{π_{i} π_{j}}}, \end{aligned}

ensuring the total probability constraint:

p (i ∣ i, j) + p (j ∣ i, j) + p (0 ∣ i, j) = 1.

$p (i ∣ i, j)$ is the probability that $i$ is preferred over $j$ . Similarly, $p (j ∣ i, j)$ is the probability that $j$ is preferred over $i$ . $p (0 ∣ i, j)$ represents the probability of a tie.

It is also important to note that the Bradley-Terry model forms a special case of both the Rao-Kupper model when the threshold parameter $θ = 1$ , and of the Davidson model when the scaling constant $v = 0$ .

The Log-likelihood function:

\ln L (π, v) = \frac{1}{2} \sum_{i = 1}^{t} s_{i} \ln π_{i} + T \ln v - \sum \sum_{i < j} r_{i j} \ln (π_{i} + π_{j} + v \sqrt{π_{i} π_{j}})

where

$s_{i} = 2 w_{i} + t_{i}, i = 1, . . ., t$ is the total number of wins and ties for treatment $i$ .
$w_{i j}$ and $w_{j i}$ is the number of times treatment $i$ preferred over $j$ , and vice versa. $w_{i} = \sum_{j} w_{i j}$ .
$t_{i j}$ is the number of times neither treatment is preferred, $t_{i} = \sum_{j} t_{i j}$ .
$T = \sum \sum_{i < j} t_{i j}$ represents the total number of ties across all treatment comparisons.
$W$ is a matrix of wins $[w_{i j}; i, j = 1, . . ., t]$ .
$r_{i j}$ is the number of independent responses for the comparison of treatments $i$ and $j$ , $r_{i j} = w_{i j} + w_{j i} + t_{i j}$ .
$r_{i i} = w_{i i} = t_{i i} = 0$ .

Each treatment is paired with others, and responses are independently recorded for each pairwise comparison. The total number of such comparisons is calculated as $N = \sum \sum_{i < j} r_{i j}$ .

The maximum likelihood estimates (MLE) $(p, \hat{v})$ for the parameters $(π, v)$ is obtained by solving:

\begin{aligned} s_{i} / p_{i} - g_{i} (p, \hat{v}) & = 0, for i = 1, . . ., t \\ T / \hat{v} - h (p, \hat{v}) & = 0 \end{aligned}

with the functions:

\begin{aligned} g_{i} (p, \hat{v}) & = \sum_{j} r_{i j} (2 + \hat{v} \sqrt{p_{j} / p_{i}}) / (p_{i} + p_{j} + \hat{v} \sqrt{p_{i} p_{j}}), \\ h (p, \hat{v}) & = \sum \sum_{i < j} r_{i j} \sqrt{p_{i} p_{j}} / (p_{i} + p_{j} + \hat{v} \sqrt{p_{i} p_{j}}), \end{aligned}

where

$p = (p_{1}, . . ., p_{t})$ .
$π = (π_{1}, . . ., π_{t})$ .

The Existence and Uniqueness of Solutions:
Following Ford’s [5] Assumption of Non-empty Subset Preference for the Bradley-Terry model, the maximisation of $L (π, v)$ over the region ${π_{i} > 0, \sum π_{i} = 1, 0 < v < \infty}$ is analyzed under a restriction on the matrix $W$ . This setup requires $T > 0$ and sets $L (π, v) = 0$ on the boundary, allowing a uniformly continuous extension to the same region, which establishes the existence and uniqueness of the maximum.

For $t = 2$ , explicit solutions are given by $p_{i} = \frac{w_{i}}{w_{1} + w_{2}}$ and $\hat{v} = \frac{T}{\sqrt{w_{1} w_{2}}}$ .
For $t > 2$ , iterative methods are needed.

Example 4

Using the model in a football tournament scenario among three teams: Team A, Team B, and Team C. Here’s how to calculate probabilities for one of the matches, Team A vs. Team B, to demonstrate the model’s application. Team strengths of A, B and C are $π_{A} = 0.5, π_{B} = 0.3, π_{C} = 0.2$ . The scaling constant $v = 0.1$ . According to the equation (above), calculated probabilities:

Team A winning: $p (A ∣ A, B) = \frac{π_{A}}{π_{A} + π_{B} + v \sqrt{π_{A} π_{B}}} \approx 57.1 %$
Team B winning: $p (B ∣ A, B) = \frac{π_{B}}{π_{A} + π_{B} + v \sqrt{π_{A} π_{B}}} \approx 42.9 %$
Draw: $p (0 ∣ A, B) = \frac{v \sqrt{π_{A} π_{B}}}{π_{A} + π_{B} + v \sqrt{π_{A} π_{B}}} \approx 4.8 %$

Model Derivation 4

The geometric mean is defined as:

G = {(\prod_{i = 1}^{n} x_{i})}^{\frac{1}{n}}

Thus, when $n = 2$ , the geometric mean would be $\sqrt{x_{1} \cdot x_{2}}$ . Then we can easily have the tie equation, where $v \geq 0$ is a scaling constant.

From the Lemma of Luce’s Choice Axiom, express $p (j ∣ i, j)$ in terms of $p (i ∣ i, j)$ :

p (j ∣ i, j) = p (i ∣ i, j) \cdot \frac{π_{j}}{π_{i}}

The probability of a tie, $p (0 ∣ i, j)$ , is defined as:

\begin{aligned} p (0 ∣ i, j) & = v \sqrt{p (i ∣ i, j) \cdot p (j ∣ i, j)} \\ = v \sqrt{p (i ∣ i, j) \cdot (p (i ∣ i, j) \cdot \frac{π_{j}}{π_{i}})} \\ = v \cdot p (i ∣ i, j) \cdot \sqrt{\frac{π_{j}}{π_{i}}} \end{aligned}

Given the total probability equation:

\begin{aligned} 1 & = p (i ∣ i, j) + p (j ∣ i, j) + p (0 ∣ i, j) \\ = p (i ∣ i, j) + p (i ∣ i, j) \cdot \frac{π_{j}}{π_{i}} + v \cdot p (i ∣ i, j) \cdot \sqrt{\frac{π_{j}}{π_{i}}} \\ = p (i ∣ i, j) \cdot (1 + \frac{π_{j}}{π_{i}} + v \cdot \sqrt{\frac{π_{j}}{π_{i}}}) \end{aligned}

Finally, solving for $p (i ∣ i, j)$ :

p (i ∣ i, j) = \frac{1}{1 + \frac{π_{j}}{π_{i}} + v \cdot \sqrt{\frac{π_{j}}{π_{i}}}} = \frac{π_{i}}{π_{i} + π_{j} + v \sqrt{π_{i} π_{j}}}

Similarly, it is easy to obtain $p (j ∣ i, j)$ and $p (0 ∣ i, j)$ .

Now, to find out the maximum likelihood estimates. The likelihood $L$ for all the observed outcomes is the product:

L = \prod_{i < j} [p (i ∣ i, j)]^{w_{i j}} \cdot [p (j ∣ i, j)]^{w_{j i}} \cdot [p (0 ∣ i, j)]^{t_{i j}}

The log-likelihood $\ln L$ is:

\begin{aligned} \ln L (π, v) & = \sum_{i < j} {w_{i j} \ln p (i ∣ i, j) + w_{j i} \ln p (j ∣ i, j) + t_{i j} \ln p (0 ∣ i, j)} \\ = \sum_{i < j} (w_{i j} \ln π_{i} + w_{j i} \ln π_{j} + t_{i j} \ln (v \sqrt{π_{i} π_{j}})) \\ - \sum_{i < j} r_{i j} \ln (π_{i} + π_{j} + v \sqrt{π_{i} π_{j}}), \end{aligned}

since $r_{i j} = w_{i j} + w_{j i} + t_{i j}$ .

By using the properties of logarithms $\ln a b = \ln a + \ln b$ and $\ln \sqrt{a} = \frac{1}{2} \ln a$ :

t_{i j} \ln (v \sqrt{π_{i} π_{j}}) = t_{i j} (\ln v + \frac{1}{2} \ln π_{i} + \frac{1}{2} \ln π_{j})

Substitute this back into the $\ln L$ :

\begin{aligned} \ln L (π, v) & = \sum_{i < j} (w_{i j} \ln π_{i} + w_{j i} \ln π_{j} + t_{i j} (\ln v + \frac{1}{2} \ln π_{i} + \frac{1}{2} \ln π_{j})) \\ - \sum \sum_{i < j} r_{i j} \ln (π_{i} + π_{j} + v \sqrt{π_{i} π_{j}}) \\ = \sum \sum_{i < j} (w_{i j} \ln π_{i} + w_{j i} \ln π_{j}) + \sum \sum_{i < j} (\frac{1}{2} t_{i j} \ln π_{i} + \frac{1}{2} t_{i j} \ln π_{j}) + T \ln v \\ - \sum \sum_{i < j} r_{i j} \ln (π_{i} + π_{j} + v \sqrt{π_{i} π_{j}}) \\ = \sum_{i = 1}^{t} (\sum_{j \neq i} w_{i j} \ln π_{i}) + \sum_{i = 1}^{t} (\frac{1}{2} \sum_{j \neq i} t_{i j} \ln π_{i}) + T \ln v \\ - \sum \sum_{i < j} r_{i j} \ln (π_{i} + π_{j} + v \sqrt{π_{i} π_{j}}) \end{aligned}

The contributions to each $π_{i}$ from all pairings in which $i$ is involved, either as the preferred or as the compared treatment, across all $j \neq i$ :

\sum_{i < j} (w_{i j} \ln π_{i}) + \sum_{i > j} (w_{j i} \ln π_{j}) = \sum_{i = 1}^{t} (\sum_{j \neq i} w_{i j} \ln π_{i})

Similarly, it is easy find for $t_{i j}$ . Thus,

\ln L (π, v) = \frac{1}{2} \sum_{i = 1}^{t} s_{i} \ln π_{i} + T \ln v - \sum \sum_{i < j} r_{i j} \ln (π_{i} + π_{j} + v \sqrt{π_{i} π_{j}})

Since, $T = \sum \sum_{i < j} t_{i j}$ and $s_{i} = 2 w_{i} + t_{i} = \sum_{j \neq i} 2 w_{i j} + \sum_{j \neq i} t_{i j}$ .

To find the MLE, we take the partial derivatives of $\ln L$ with respect to each parameter $π_{i}$ and $v$ , and set them to zero.

For example, under the constraint $\sum_{i}^{t} π_{i} = 1$ , set $t = 2$ : we have only one pair (i.e., $i < j$ becomes $1 < 2$ ), simplifying the equation to:

\ln L (π_{1}, π_{2}, v) = \frac{1}{2} s_{1} \ln π_{1} + \frac{1}{2} s_{2} \ln π_{2} + T \ln v - r_{12} \ln (π_{1} + π_{2} + v \sqrt{π_{1} π_{2}})

Given that $π_{1} + π_{2} = 1$ , we can substitute $π_{2} = 1 - π_{1}$ . Therefore, the log-likelihood becomes:

\begin{aligned} \ln L (π_{1}, v) & = \frac{1}{2} s_{1} \ln π_{1} + \frac{1}{2} s_{2} \ln (1 - π_{1}) + T \ln v \\ - r_{12} \ln (π_{1} + (1 - π_{1}) + v \sqrt{π_{1} (1 - π_{1})}) \\ = \frac{1}{2} s_{1} \ln π_{1} + \frac{1}{2} s_{2} \ln (1 - π_{1}) + T \ln v \\ - r_{12} \ln (1 + v \sqrt{π_{1} (1 - π_{1})}) \end{aligned}

Derivative with respect to $π_{1}$ :

\begin{aligned} L & = π_{1}^{w_{12}} \cdot (1 - π_{1})^{w_{21}} \cdot v^{t_{12}} \\ \ln L & = w_{12} \ln π_{1} + w_{21} \ln (1 - π_{1}) + t_{12} \ln v \\ \frac{\partial \ln L}{\partial π_{1}} & = \frac{w_{12}}{π_{1}} - \frac{w_{21}}{1 - π_{1}} = 0 \end{aligned}

Derivative with respect to $v$ :

\frac{\partial \ln L}{\partial v} = \frac{T}{v} - \frac{r_{12} \sqrt{π_{1} (1 - π_{1})}}{1 + v \sqrt{π_{1} (1 - π_{1})}} = 0

Then solving for $v$ :

v = \frac{T}{(r_{12} - T) \sqrt{π_{1} (1 - π_{1})}}

where $r_{12} = w_{12} + w_{21} + T$ , and under the assumption that $T > 0$ . Therefore, the MLEs are:

p_{1} = \frac{w_{12}}{w_{12} + w_{21}}, p_{2} = \frac{w_{21}}{w_{12} + w_{21}},

\hat{v} = \frac{T}{\sqrt{w_{12} w_{21}}}

Conclusion

To sum up, In the Rao-Kupper model [1], the probabilities $p_{i j}$ and $p_{j i}$ each have denominators influenced by the opposing stimuli strengths $π_{j}$ and $π_{i}$ . On the other hand, the Davidson model [3] standardizes the denominators of $p (i ∣ i, j)$ and $p (j ∣ i, j)$ by combining the geometric mean of the two stimulus strengths $\sqrt{π_{i} π_{j}}$ . This results in the same denominator in the Davidson model, thereby simplifying its development and analysis.

However, the previous models still have shortcomings when considering real-life application scenarios, such as quantifying the defensive and offensive abilities. Maher model [6] is based on the Poisson distribution, resolves these issues and calculates the expected number of goals scored by each team.

References

[1] P. V. Rao and L. L. Kupper. Ties in paired-comparison experiments: A generalization of the Bradley-Terry model. Journal of the American Statistical Association, 62(317):194–204, Mar 1967. Available

[2] Ralph Allan Bradley. Some statistical methods in taste testing and quality evaluation. Biometrics, 9(1):22–38, 1953.

[3] Roger R. Davidson. On extending the Bradley-Terry model to accommodate ties in paired comparison experiments. Journal of the American Statistical Association, 65(329):317–328, 1970.

[4] R.D. Luce. Individual Choice Behavior: A Theoretical Analysis. Wiley, 1959.

[5] Jr. Ford, L. R. Solution of a ranking problem from binary comparisons. The American Mathematical Monthly, 64(8):28–33, 1957. Part 2: To Lester R. Ford on His Seventieth Birthday.

[6] M. J. Maher. Modelling association football scores. Statistica Neerlandica, 36:109–118, 1982.

“觉得不错的话，给点打赏吧 ୧(๑•̀⌄•́๑)૭”