## Lecture 21: SVDs

A few things about today’s lecture:

• This one is kinda important, I meant to discuss it but forgot. We defined ${v_1}$ as the best-fit one-dimensional subspace, then ${v_2}$ subject to having fixed ${v_1}$. This greedy approach is nice, but one worry is that maybe if we optimized over two dimensional spaces, we’d get a better fit than the space spanned by ${v_1, v_2}$.

Thankfully not: the best ${2}$-dimensional subspace (i.e., the one that maximizes the sum of squared projections of the ${a_i}$s onto it) is indeed spanned by ${v_1, v_2}$.

Why? Suppose there was another ${2}$-dimensional subspace ${W}$. Pick an orthonormal basis ${w_1, w_2}$ for ${W}$ such that ${w_2 \perp v_1}$. (Clearly such a basis exists.) Now, we know that ${\|Aw_1\| \leq \|Av_1\|}$, by our greedy choice of ${v_1}$. Also, ${\|Aw_2\| \leq \|Av_2\|}$, by our greedy choice of ${v_2}$ (we could have chosen ${w_2 \perp v_1}$ but chose ${v_2}$ instead). So

$\displaystyle \|Aw_1\|^2 + \|Aw_2\|^2 \leq \|Av_1\|^2 + \|Av_2\|^2$

and hence the sum of squared projections onto ${W}$ are no longer than those onto span${(v_1, v_2)}$. A similar proof works to show that the space spanned by ${\{v_1, \ldots, v_k\}}$ is the best-fit ${k}$-dimensional subspace.

• For a square symmetric matrix ${A = V\, diag(\lambda_1, \ldots, \lambda_n)\, V^T}$, the definition of ${f(A)}$ is indeed for all functions ${f: {\mathbb R} \rightarrow {\mathbb R}}$:

$\displaystyle f(A) = V \, diag(f(\lambda_1), \ldots, f(\lambda_n)) \, V^T.$

It may not be interesting for some functions ${f}$, of course.

• Melanie pointed out that if we want to find the best-fit affine subspace, we should imagine the origin to be at the center-of-gravity of the points in ${A}$, and find the best-fit linear subspace through that. A proof is given in her thesis, for example.