Euiwoong asked about what happens with constrained linear function optimization (which are -smooth), since this could solve LPs. So a minor issue is that the proofs require , since they first set , which will be disastrous in this case.

A potentially bigger question, of course, is this: how would we do the projection operation? Given , what is the closest point to . For a polytope given as , this might not be that easy a problem to solve — at least I don’t see how to do this quicker than linear programming. Any ideas?

In fact, the projection operation is one reason why people sometimes use something called *conditional gradient descent* (a.k.a. the *Frank-Wolfe* algorithm). The update rule now is: take and find

I.e., use an LP solver to find the best point in in the direction of the negative gradient. And now walk a little in the direction of :

As the Bubeck book says, this approach leverages the fact that linear programming is in some cases a simpler problem than projection. Also, one can prove guarantees for the Frank-Wolfe process for -smooth functions that are similar to those for projected gradient descent.

I will write more with (pointers to) the bad examples for the basic gradient descent algorithm, which should give us more intuition about what is happening here.

### Like this:

Like Loading...

*Related*