## Lec #12: Notes on solving LPs

Notes for today’s lecture:

• Tightness of the Hedge guarantee. Consider the basic Hedge analysis, which says that for any ${\varepsilon \leq 1}$, we have regret at most ${\frac{\ln N}{\varepsilon} + \varepsilon T}$. Now if we were to set ${\varepsilon = \sqrt{\ln N}{T}}$ by balancing those terms, the regret bound would be ${2\sqrt{T \ln N}}$. This is tight, upto the constant term.

Let’s see why in the case ${N = 2}$. Suppose we again have two experts ${H}$ and ${T}$, and are trying to predict a fair coin toss. I.e., every time the loss vector is either ${(1,-1)}$ or ${(-1, 1)}$ with equal probability. So our expected gain is at least ${0}$. But after ${T}$ coin tosses, with constant probability we have ${\Omega(\sqrt{T})}$ more flips of one type than of another, and indeed, the expected gain of one of the static experts is ${\Omega(\sqrt{T})}$. So our regret cannot be less than ${\Omega(\sqrt{T})}$ even for two experts. Similarly for ${N}$ experts one can show that ${\Omega(\sqrt{T \log N})}$ is necessary.

• Larger Range of Loss Vectors. For the setting where loss/gain functions could be in ${[-\rho, \rho]^N}$, we claimed an algorithm with average regret less than ${\epsilon}$ as long as ${T \geq \frac{4 \rho^2 \ln N}{\varepsilon}}$. We left it as an exercise in HW3.

In fact, you can prove something slightly weaker for the asymmetric setting where losses are in ${[-\gamma, \rho]^N}$, where ${1 \leq \gamma \leq \rho}$. In handwritten notes on the webpage, I show how to use a guarantee for Hedge to get

$\displaystyle \frac{1}{T} \sum_t \langle p^t, \ell^t \rangle \leq (1 - \varepsilon) \frac{1}{T} \sum_t \langle e_i, \ell^t \rangle + \varepsilon \quad \forall i \in [N]$

as long as ${T = \Omega( \frac{\rho \gamma \ln N}{\varepsilon^2})}$. The constants are worse, and there’s a ${(1-\varepsilon)}$ term hitting the “best expert” term, but the analysis is mechanical.

You can use this gurantee along with the shortest-path oracle to get an ${O(\frac{F^* \log m}{\varepsilon^{O(1)}})}$-iteration algorithm for ${(1-\varepsilon)}$-approximate maximum flow algorithm, since the gains will be in the range ${[-1, F^*]}$. More details below.

• The max-flow part was fast, here are some more details. We wrote the LP, and plugged it into the multiplicative weights framework. Since we had a constraint for each edge ${e}$, the “average” constraint looked like:

$\displaystyle \sum_e p_e \sum_{P: e \in P} f_P \leq \sum_e p_e \cdot 1 = 1.$

Flipping the summations, we get

$\displaystyle \sum_{P} f_P \sum_{e \in P} p_e \leq 1.$

If we denote ${len(P) = \sum_{e \in P} p_e}$ the optimal solution is to send flow along a shortest path, where the edge lengths are ${p_e}$. We can find this using Dijkstra even though we cannot write the massive LP down. Since the “easy” constraints were ${K = \{ f_P \geq 0, \sum_P f_P = F^*\}}$, we send ${F^*}$ flow along this shortest path. Now we update the probabilities (edge lengths), find another shortest path, push flow, and repeat. At each step the gains will be in the range ${[-1, F^*]}$.

So we can use the asymmetric losses analysis above. After ${T = \Theta(\frac{F^* \log m}{\varepsilon^{2}})}$-iterations, taking the “average” flow ${\widehat{x}_e := \frac{1}{T} \sum_{t = 1}^T \sum_{P: e \in P} f^t_P}$, we have that for each edge ${e}$,

$\displaystyle (1-\varepsilon)\cdot\left(\widehat{x}_e - 1\right) \leq \varepsilon \quad \Rightarrow \quad \widehat{x}_e \leq 1 + \frac{\varepsilon}{1-\varepsilon} = \frac{1}{1-\varepsilon}.$

(How? chase through the LP-solving analysis we did in lecture, but use the above asymmetric analysis, instead of the standard symmetric one we used.)

Finally, the flow is not feasible, since it may violate edge capacities. So scale down. I.e., define the flow ${(1-\varepsilon)\widehat{x}}$: it has value ${(1-\varepsilon)F^*}$ and satisfies all the edge constraints. Viola.

• Next lecture we will do the improvement using electrical flows.