## Lec #9:Notes

A few loose ends from yesterday’s lecture:

• The formula for computing inverses of a submatrix from the inverse of the whole thing is given by this:

(from this survey by Pawel Parys). A good explanation of the Mucha and Sankowski ${O(n^\omega)}$-time algorithm is in Marcin Mucha’s thesis.
• I looked at the Tutte paper again. The reason he was introducing the Tutte matrix (not his terminology) was to prove Tutte’s theorem (again, not his terminology). Namely, that a graph has a perfect matching if and only if, for every subset of vertices ${U \subseteq V}$, the number of odd components in ${G \setminus V}$ is at most ${|U|}$. (Note that this is a special case of the Tutte-Berge formula.) Pretty cool that this 5-page paper introduced two important concepts.
• The proof of the isolation lemma went a little fast towards the end, so let me say it again. We wanted to bound the probability that an edge is confused, i.e., that the min-weight perfect matching containing ${e}$ has the same as that of the min-weight perfect matching not containing ${e}$, by ${1/2m}$.So let’s give weights to all edges apart from ${e}$. We can now read off the weight of the min-weight perfect matching not containing ${e}$, say this is some constant ${W_0}$. Moreover, if the weight of ${e}$ is eventually set to ${w_e}$, then since the weights of all other elements is known, the min-weight perfect matching will have weight ${W_1 + w_e}$, where ${W_1}$ is another constant depending on the weights of other edges. So, for ${e}$ to be confused, we must have ${w_e = W_0 - W_1}$. And the chance this happens is at most ${1/2m}$, since we are choosing the weight for each edge independently from ${\{1,2,\ldots, 2m\}}$. Done.We will see how to use the isolation lemma in tomorrow’s lecture on smoothed analysis. (Slight change in lecture order, by the way.)
• By the way, the really useful part of the isolation lemma is that the weights are small, linear in the number of edges even though the number of matchings is exponential. E.g., if you assigned weight of edge ${i}$ to be ${2^i}$, you would again get that the min-weight matching would be unique. But the weights would be exponentially large, which is undesirable in many contexts.

## Lec #7: Notes

Today we saw several definitions and facts about polytopes, including proofs of the integrality of the perfect matching polytope for bipartite graphs. Some comments.

• We saw the definition of the perfect matching polytope ${K_{PM-gen}}$ for non-bipartite graphs (with exponentially many constraints), but did not prove it correct. Do give it a try yourself (or read over a proof via the basic-feasible solution approach which appears in my hand-written notes, or in these notes).
• We saw a definition of the ${r}$-arborescence polytope ${K_{arb}}$ (again with exponentially many constraints). As an exercise, show that ${K_{arb}}$ is also an integral polytope via the vertex approach.

Here’s how to go about it. In lecture #2, we saw an algorithm that finds the min-cost arborescence for non-negative edge costs. First, change it slightly to find min-cost arborescence ${T}$ for general edge costs ${c_e}$. Naturally, ${T}$ gives you an (integer) solution to the LP ${\min \{ c^\intercal x \mid x \in K_{arb}\}}$. No reason to believe it is optimal for the LP, since this is the best integer solution and there may exist cheaper fractional solutions.

Next, take the dual of this LP. If you show a set of values for the dual such that the dual value equals the primal value, then the integer solution ${T}$ you found is also an optimal solution to the LP. (How would you do it? Think of the prices we were defining at the end of Lecture 2.) Now the optimal solution being an integer solution is true for every set of edge costs. So the vertices of the polytope ${K_{arb}}$ must be all integral.

If this is a little vague, no worries: we will see the corresponding proofs for bipartite perfect matchings in the next lecture.

• Both the above LPs have exponential size. But ${K_{arb}}$ has an “equivalent” poly-sized representation using some extra variables. Here’s a sketch. Indeed, the bad constraints are the ones saying that for every set ${S}$ not containing the root, ${\sum_{e \in \partial^+ S} x_e \geq 1}$. This is the same as saying that the min-cut between any vertex and the root is at least ${1}$, when viewing ${x_e}$ as arc capacities. By maxflow-mincut, the maxflow from each vertex to the root (using arc-capacities ${x_e}$) is at least ${1}$. So replace the exponentially many constraints by other constraints that enforce this max-flow condition. This may requires polynomially many extra variables. But “projected down” onto the variables ${\{ x_e \}_{e \in E}}$ the polytope is the same. This is another instance of the “projection example” I was attempting to draw in class. These things are called “extended formulations”.
• Does ${K_{PM-gen}}$ have a similarly small extended formulation? Recently Thomas Rothvoss showed that there are no tricks like the ones above, settling a decades-old conjecture of Yannakakis. In particular, every polytope that projects down to ${K_{PM-gen}}$ (for the complete graph ${K_n}$) must have ${2^{\Omega(n)}}$ constraints. See his slides/video for more details. As we remarked, this is an unconditional lower bound.
• For an extreme case when extended formulations help, Goemans shows a polytope with ${2^n - 2}$ facets and ${n!}$ extreme points that arises from projecting down a polytope in ${O(n \log n)}$ dimensions using ${O(n \log n)}$ facets.
• Finally, the use of total unimodularity for showing integrality of polytopes is indeed due to Alan Hoffman and Joe Kruskal. Here is their paper titled Integral Boundary Points of Convex Polyhedra, along with historical comments by the authors. Dabeen reminded me at the end of lecture that Edmonds and Giles actually developed a closely related concept, the theory of total dual integrality.

## Lec #4: Notes

Some more notes about today’s lecture:

• We know that ${APSP(n) \leq O(\log n) \cdot MSP(n)}$, and in a forthcoming homework we will remove this log factor. The reduction in the other direction is not that mysterious: given two matrices ${A}$, ${B}$, can you write down an ${O(n)}$ vertex graph ${G}$ such that the ${APSP}$ in this graph gives you the MSP? Please post answers in the comments below.
• If you’ve not seen Strassen’s algorithm before, it is an algorithm for computing ${n \times n}$ matrix multiplication in time ${n^{\log_2 7} \approx n^{2.81}}$. It’s quite simple to state, and one can think of it as a 2-dimensional version of Karatsuba’s algorithm for multiplying two numbers. Mike Paterson has a very beautiful geometric interpretation of the sub-problems Strassen comes up with, and how they relate to Karatsuba.

The time for matrix multiplication was later improved by the Coppersmith-Winograd algorithm to get ${\omega = 2.376}$. Small improvements in the lower-order digits of ${\omega}$ were given by Andrew Strothers and (our own) Virginia Vassilevska-Williams.

To answer Goran’s question, I looked over the original CW paper: it is presented as being similar to Strassen’s algorithm in that it breaks the matrices into smaller blocks and recurses on them. But the recursion is quite mysterious, at least to me. Recently, Cohn, Kleinberg, Szegedy, and Umans gave some group-theoretic approaches that use some of the CW ideas, but other seemingly orthogonal ideas, to match the CW-bound. They also give conjectures that would lead to ${\omega = 2}$.

• I was mistaken in claiming that for general directed graphs, the ${(1+\epsilon)}$-approximate APSP is as hard as computing the exact APSP even for small values of ${\epsilon > 0}$. One advantage is that one can round all edge weights to powers of ${(1+\epsilon)}$, and that gives a lot of room to play with. There are a lot of positive results in this setting, e.g., this paper of Uri Zwick, and a talk by him.

## Fredman’s Trick

OK, sorry, I was being silly.

We wanted to compute MSP of two ${n \times w}$ matrices ${A}$ and ${B}$. I.e, we wanted ${C := A \odot B^\intercal}$, say. (Notice that I’ve transposed ${B}$.) In words, this is

$\displaystyle C_{ij} = \min_{k \in [w]} \{ A_{ik} + B^\intercal_{kj} \} = \min_{k \in [w]} \{ A_{ik} + B_{jk}\}.$

If ${k^*}$ is the minimizer of this expression then

$\displaystyle A_{ik^*} + B_{jk^*} \leq A_{ik} + B_{jk} \quad \forall k \in [w].$

which is equivalent to

$\displaystyle A_{ik^*} - A_{ik} \leq - (B_{jk^*} - B_{jk}) \quad \forall k \in [w].$

Indeed, the signs are opposite for the ${A}$ and ${B}$ entries. No worries.

Now, for each possible ${p,q \in [w]}$, do the following. Take the ${2n}$ numbers

$\displaystyle A_{1p} - A_{1q}, \ldots, A_{np} - A_{nq}, -(B_{1p} - B_{1q}), \ldots, -(B_{np} - B_{nq})$

and sort them with ${O(n \log n)}$ comparisons.

How many comparisons have we done over all? We enumerated over ${w^2}$ pairs of ${p,q}$, and for each we did ${O(n \log n)}$ comparisons. Total of ${O(w^2 n \log n)}$. We claim we don’t need to do any more, we have all the information to compute the MSP.

How? Take ${i,j}$. Find some index ${k^* \in [w]}$ such that for every ${k \in [w]}$, all the entries ${A_{ik^*} - A_{ik}}$ precede all the entries ${- (B_{jk^*} - B_{jk})}$. This can be done by just looking at the sorted lists for ${p = k^*}$ and ${q=k}$. Now you can set ${C_{ij} = A_{ik^*} + B_{jk^*}}$. Done.

Let’s also complete the analysis for Fredman’s trick. We’d broken down the ${n \times n}$ MSP into ${n/w}$ different ${n \times w}$ mini-MSPs, which together take ${(n/w) \cdot (w^2 n \log n) = n^2 w \log n}$ time. Now we can combine them together by taking the component-wise minima. Each minimum takes ${(n/w)-1 }$ comparisons since there are ${n/w}$ MSPs. And there are ${n^2}$ entries. So a total of ${n^3/w}$ comparisons. Balancing the two gives us ${O(n^2 \sqrt{n \log n})}$ comparisions.

Fredman’s paper gives a better analysis to get rid of the log term, and uses it to get an algorithm with runtime better than ${n^3}$ by an ${(\log n/\log \log n)^{1/3}}$ factor. But we’ll see a larger improvement in the next lecture.

## Lecture 27: SDPs Part I

For this discussion, imagine that the SDP is in the form

$\displaystyle \max \{ C \bullet X \mid A_i \bullet X = b_i \; \forall i \in [m], X \succeq 0 \}$

Guru pointed out a major omission in the runtime of Ellipsoid for SDPs: even when the input numbers are all of size ${O(1)}$, SDPs can have solutions values of size ${2^{2^n}}$. Note that the solutions for LPs are only singly exponential, so this is much much worse — we can’t even write down the solution in polynomial time. See the Gärtner and Matousek book for a detailed discussion of the issues here. (Remember, the CMU library gives you access to the electronic version of the book.)

What’s the fix? The actual runtime is polynomial not only in the length of the input, but also in ${\log (R/\varepsilon)}$, where ${R}$ is an upper bound on the Frobenius norm of the optimal solution ${X^\star}$. Thankfully, for most combinatorial optimization problems, the bound ${R}$ is small. E.g., for max-cut, each ${X^\star_{ij} = \langle v_i, v_j \rangle \leq 1}$, and hence ${R = O(n^2)}$ in this case.

There is another concern, though, which I knowingly swept under the rug (and will largely continue to do so): the Ellipsoid algorithm requires that the problem has a optimal solution that is “${\varepsilon}$-deep” inside the feasible set, and if so it returns a solution that satisfies all the equality constraints but is only ${\varepsilon}$-close to being feasible with respect to the PSD constraint (and with value within an additive ${\varepsilon}$ of the optimum). This means that we usually need to do some post-processing to make the solution PSD (and hence can only satisfy the equality constraints approximately). However, for now we will assume these issues away.

A different approach is to use multiplicative-weights based algorithms for SDPs which returns solutions that are PSD matrices, but which have additive errors for both the objective function value and the equality constraints. These algorithms require bounds on the trace of ${X}$, and have runtimes that depend on the maximum Frobenius norm of the ${A_i}$ and ${C}$ matrices, and also on ${poly(1/\varepsilon)}$ instead of ${poly \log(1/\varepsilon)}$. More of this in the Gärtner/Matousek book.

## Homework 6

Hi all: the final HW has been posted. It’s a little longer with 6 questions (the first three are questions you must solve individually, the last three allow collaboration). The due date is April 17th, but see if you can get it done early before the busy days towards the end of the semester.

Questions, as always, in the comments. (For the individual questions, if you think your question might be too leading for others, just send mail.)

Also, do have a look at the first exercise before next Friday, it will help you become more familiar with positive semi-definite (psd) matrices, which we will be using quite a bit for the next lecture…

–Anupam

Posted in Uncategorized | 6 Comments

## Lecture 26: Online Algorithms

Aram asked how the random algorithm does for paging: at each time just pick a random page to evict from the cache. Jason sent in an example where the optimum is 1, but the expected cost for RANDOM is k. The sequence is:

$\displaystyle 1,2,\ldots, k, (0, 1, 2, \ldots, k-1)^*$

The optimal action is to evict ${k}$ when you see the first ${0}$, with an optimal cost of 1. But the random algorithm, on each cache miss, will pick a random page to evict, until it finally hits on ${k}$. Each time it has a 1-in-k chance of success, which leads to an expected cost of ${k}$.

In fact, Raghavan and Snir show that any memoryless randomized algorithm for paging cannot have competitive ratio less than ${k}$. They have a similar construction, and give sequences where the ratio remains ${k}$ even when the optimal costs go to infinity. (Else we could give a weak-competitiveness bound with an additive factor of ${k}$.)