A few things about today’s lecture:
- This one is kinda important, I meant to discuss it but forgot. We defined as the best-fit one-dimensional subspace, then subject to having fixed . This greedy approach is nice, but one worry is that maybe if we optimized over two dimensional spaces, we’d get a better fit than the space spanned by .
Thankfully not: the best -dimensional subspace (i.e., the one that maximizes the sum of squared projections of the s onto it) is indeed spanned by .
Why? Suppose there was another -dimensional subspace . Pick an orthonormal basis for such that . (Clearly such a basis exists.) Now, we know that , by our greedy choice of . Also, , by our greedy choice of (we could have chosen but chose instead). So
and hence the sum of squared projections onto are no longer than those onto span. A similar proof works to show that the space spanned by is the best-fit -dimensional subspace.
- For a square symmetric matrix , the definition of is indeed for all functions :
It may not be interesting for some functions , of course.
- Melanie pointed out that if we want to find the best-fit affine subspace, we should imagine the origin to be at the center-of-gravity of the points in , and find the best-fit linear subspace through that. A proof is given in her thesis, for example.