Alice wants to win some money by betting on horce racing games. She knows nothing about horce racing. Fortunately, she knows that some of her friends are very good at betting on horce racing; however, she does not know who exactly is the best. Suppose Alice will bet on several rounds sequentially. Can Alice bet as well as the best among her friends in the long run?
Indeed, there are algorithms for doing so. A famous one is the so-called hedge algorithm, introduced in the paper “A desicion-theoretic generalization of on-line learning and an application to boosting” by Y. Freund and R. Schapire.
The idea is to switch between Alice’s fridends’ strategies. Assume for simplicity that Alice will always bet 1 unit of money for each round. Alice keeps in her mind a weight for each of her friend. At the beginning of each round, she chooses one of her friends with probability proportional to the weights, say Bob, and follows Bob’s bet. Once a round finishes, Alice sees how much she would lose had she followed any of her friend. She then updates the weights accordingly. Those who yielded smaller losses have higher weights, and vice versa.
Let us formalize the idea. We label the rounds by positive integers $t \in \mathbb{N}$. Suppse Alice has $N$ friends; let us index them by the set $[N] := \lbrace 1, \ldots, N \rbrace$. We summarize the weights before the $t$-th round by a weight vector $w_t := ( w_t (i) )_{i \in [N]} \in \mathbb{R}^N$ in the probability simplex, where $w_t (i)$ denotes the weight assigned to Friend $i$. We also summarize the losses yielded by each friend’s bet for the $t$-th round by a vector
Then for the $t$-th round, the expected loss of Alice is
Suppose Alice will bet on $T$ rounds. The goal of Alice is to have a small regret
where $[T] := \lbrace 1, \ldots, T \rbrace$.
The regret measures the difference between Alice’s expected loss and the loss yielded by the best friend in hindsight. If $R_T = o(T)$, in hindsight, the difference between the average expected losses of Alice and the best friend converges to zero. Notice that the best friend in general varies with $T$.
The hedge algorithm is a specific rule of setting the weights. Initially, Alice chooses any weight $w_1$ such that $w_1 (i) \neq 0$ for all $i$. After observing the losses $x_t$, Alice updates the weights as
where $Z_t := \sum_{i \in [N]} w_t (i) \, \mathrm{e}^{- \eta x_t (i)}$ is the normalizing constant, and $\eta$ is a properly chosen learning rate.
The hedge algorithm looks similar to the aggregating algorithm (AA). Indeed, we can formulate the problem using the setting of prediction with expert advice (PEA). Let Alice be Learner and her friends be the experts. Let Nature choose the loss vectors. Set the outcome space $\Omega$ as $[ 0, + \infty [^N$, prediction space $\Gamma$ as the probability simplex in $\mathbb{R}^N$, and loss function as
When each expert $i \in [N]$ chooses a fixed prediction $\gamma_t (i) = ( \delta_{i,j} )_{j \in [N]} \in \mathbb{R}^N$, we recover the original betting-horce-racing-games problem.
However, applying the AA to this specific PEA game is not trivial, and requires some generalization of the mixability condition. Assuming bounded losses, the resulting algorithm yields $R_T = O( \sqrt{T} )$. See this paper and this paper for the details.