# Blackwell Approachability

### Introduction

Suppose that Alice and Bob is playing a so-called two-player zero-sum game with perfect information. Alice chooses her action from a finite action set $\mathcal{A}$; Bob chooses his action from a finite action set $\mathcal{B}$. There is a payoff function $f$, which assigns to each action pair $(a, b) \in \mathcal{A} \times \mathcal{B}$ a payoff $f ( a, b ) \in \mathbb{R}$; Bob’s payoff is then simply $-f (a, b)$. In general, Alice and Bob may choose their actions randomly, so we consider the expected payoff: for any probability distribution $P \in \Delta ( \mathcal{A} )$ and $Q \in \Delta ( \mathcal{B} )$, define

If Alice does not have any additional information about Bob, it is reasonable to consider the very robust minimax strategy, i.e., to choose her action $a \in \mathcal{A}$ randomly according to

No matter what Bob’s strategy is, Alice is guaranteed to achieve at least the minimax expected payoff.

If the payoff function $f$ is vector-valued, talking about the minimax expected payoff becomes nonsense, as we cannot compare two vectors. This post introduces an analog for the vector-valued payoff case, proposed by David Blackwell in this classic paper.

### Blackwell Approachability

In the rest of this post, suppose that the payoff function $f$ takes values in $\mathbb{R}^d$.

Blackwell considered the following protocol of repeated games. For $t = 1, 2, \ldots$,

1. Alice announces $a_t \in \mathcal{A}$.
2. Bob announces $b_t \in \mathcal{B}$, without known $a_t$.

In general, Alice can use a randomized strategy. For convenience, let us say that Alice chooses $a_t$ randomly according to some probability distribution $P_t \in \Delta ( \mathcal{A} )$.

Recall that Alice cannot achieve the “minimax expected payoff” in this vector-payoff case, as it is just not well-define. However, it might be possible to ensure that her average payoff will ultimately lies in some specific subset of $\mathbb{R}^d$.

Define the average payoff

Definition. Let $\mathcal{X} \subseteq \mathbb{R}^d$. The set $\mathcal{X}$ is said to be approachable by Alice, if for any sequence $( b_t )_{t \in \mathbb{N}}$,

where $\mathrm{dist} ( \bar{f}_T, \mathcal{X} )$ denotes the $2$-norm distance between $\bar{f}_T$ and the projection of $\bar{f}_T$ onto $\mathcal{X}$.

If we only consider convex sets, there is an exact characterization of approachable sets.

Theorem 1. A convex set $\mathcal{X} \subseteq \mathbb{R}^d$ is approachable to Alice, if and only if for any $Q \in \Delta ( \mathcal{B} )$, there exists some $P \in \Delta ( \mathcal{A} )$ such that $f ( P, Q ) \in \mathcal{X}$.

Exercise. Suppose that for each $t$, Bob’s action $b_t$ can be dependent on Alice’s action $a_t$. Show that then even when the if and only if condition holds, $\mathcal{X}$ may not be approachable to Alice.

The only if part of Theorem 1 is easy to check. Suppose that there exists some $Q^\star \in \Delta ( \mathcal{B} )$, such that $f ( P, Q^\star ) \notin \mathcal{X}$ for any $P \in \Delta ( \mathcal{A} )$. To guarantee that Alice fails to approach $\mathcal{X}$, Bob can always choose $b_t$ randomly according to $Q^\star$ for all $t$.

We will focus on the if part of Theorem 1 in the rest of this post.

### Blackwell’s Condition and Alice’s Strategy

To prove the if part of Theorem 1, we need to show the existence of an approaching strategy. Indeed, the proof is constructive; there is an explicitly defined approaching strategy based on the following lemma.

Lemma 1. Suppose that for any $Q \in \Delta ( \mathcal{B} )$, there exists some $P \in \Delta ( \mathcal{A} )$ such that $f ( P, Q ) \in \mathcal{X}$. Then for any $v \in \mathbb{R}^d$, there exists some $P \in \Delta ( \mathcal{A} )$, such that

where $\mathrm{proj}$ denotes the projection (w.r.t. the $2$-norm) onto $\mathcal{X}$.

Proof. By definition, $\mathrm{proj} (v)$ is the minimizer of the $2$-norm distance to $v$ in $\mathcal{X}$. The optimality condition says

What we want to prove can be equivalently written as

Applying von Neumann’s minimax theorem, we can exchange the order of $\min$ and $\max$, and write

The lemma follows from the optimality condition. $Q.E.D.$

We call the condition in Lemma 1 Blackwell’s condition, following the convention in literature. Blackwell’s condition ensures that the following strategy for Alice is well-defined.

Alice’s Strategy. Define the expected and average expected payoff:

For each $t \in \mathbb{N}$, Alice computes some $P_t$ such that

and then announces $a_t$ randomly according to $P_t$.

Remark. In practice, $P_t$ can be computed as

This is a bilinear saddle point problem; there exists a variety of algorithms solving such a problem.

### Proof of Theorem 1

Since $\mathcal{X}$ is approachable if and only if its closure is approachable, we assume that $\mathcal{X}$ is closed w.l.o.g. We will also assume that $\mathcal{X}$ is bounded for convenience; we will relax this unnecessary condition at the end of the proof.

Theorem 1 is in fact a corollary of the following theorem.

Theorem 2. Suppose that the if and only if condition in Theorem 1 holds. Suppose that Alice adopts the strategy described above. Then for each $T \in \mathbb{N}$ and $\delta \in ( 0, 1 )$, with probability at least $1 - \delta$, it holds that

where $M := \max_{a, b} \left\Vert f ( a, b ) \right\Vert_2$, and $R := \max_{w \in \mathcal{X}} \left\Vert w \right\Vert_2$.

Lemma 2. For all $T \in \mathbb{N}$, it holds that

Proof. Define $\pi_t := \mathrm{proj} ( f_t )$ and $\tilde{\pi}_t := \mathrm{proj} ( \tilde{f}_t )$. By definition, we have

Writing $\tilde{f}_{t + 1} = [t / (t + 1)] \tilde{f}_t + [1 / (t+1)] f_{t + 1}$, we have

Applying Lemma 1 and the triangle inequality, we obtain

The lemma can be then easily checked by induction. Q.E.D.

Proof of Theorem 2. Recall that $\bar{f}_t$ denotes the average payoff; define $\bar{\pi}_t := \mathrm{proj}( \bar{f}_t )$. We start with the decomposition,

By Lemma 2 and the non-expansiveness of projections, we have

Define $X_t := \sum_{\tau = 1}^t f ( a_\tau, b_\tau ) - f ( P_\tau, b_\tau )$. Then $( X_t )_{t \in \mathbb{N}}$ forms a vector-valued martingale. Applying the Azuma inequality for Banach-space valued martingales, we obtain, for any $u > 0$,

for some constant $C > 0$. Q.E.D.

In general, the set to be approached may not be bounded; then $R = + \infty$ if we directly apply the definition. Notice that, however, only sets lie in the convex hull of $\{ f ( a, b ) \}$ can be approached. Denote the convex hull by $\mathcal{F}$. Hence, it suffices to replace $\mathcal{X}$ by $\mathcal{X} \cap \mathcal{F}$, which is always bounded by construction; then it is easily checked that $R$ can be bounded above by $M$ to slightly simplify the expressions.

### Calibrated Forecasting

One might wonder why one should consider vector-valued payoffs. An interesting application is calibrated forecasting. With Theorem 1, it is an easy exercise to prove the theorem in the previous post, showing the existence of an $\varepsilon$-calibrated forecasting strategy.

### References

The strategy of Alice presented above is not exactly the original one proposed by Blackwell, as the original one does not lead to the $O ( 1 / \sqrt{T} )$ with-high-probability convergence rate in Theorem 2. The strategy presented above was described in the book by Cesa-Bianchi and Lugosi, without proof. The proof above is basically a mix of the arguments and results in the following references.

1. Blackwell, D., 1956. An analog of the minimax theorem for vector payoffs. Pac. J. Math.
2. Cesa-Bianchi, N. and Lugosi, G., 2006. Prediction, Learning, and Games.
3. Naor, A., 2012. On the Banach-space-valued Azuma inequality and small-set isoperimetry of Alon–Roichman graphs. Combinatorics Probab. Comput.
4. Sorin, S., 2002. A First Course on Zero-Sum Repeated Games.