ml_study/svd.md at main · zlu/ml_study

A Gentle, Practical Tutorial on SVD

SVD in Plain Words

Singular Value Decomposition (SVD) factors any real matrix $ A\in\mathbb{R}^{m\times n} $ into three parts $A ;=; U,\Sigma,V^\top, $ where:

$ U\in\mathbb{R}^{m\times m} $ has orthonormal columns (think: a rotation- flection in the output space),
$ V\in\mathbb{R}^{n\times n} $ has orthonormal columns (a rotation- flection in the input space),
$ \Sigma\in\mathbb{R}^{m\times n} $ is diagonal (zeros off the diagonal)- th non-negative numbers $\sigma_1\ge \sigma_2\ge\cdots\ge 0 $ called the singular values.

Geometric picture. SVD says: to apply A to a vector,

rotate (by $V^\top$),
stretch/compress along perpendicular axes (by $\sigma $),
rotate again (by $U$).

Why SVD matters

Understanding rank & energy. The number of nonzero singular values equals $ \operatorname{rank}(A)$. Larger singular values capture more “energy/variance”.
Stable least squares. SVD yields robust solutions $\min_x|Ax-b|_2$, even when $A$ is not square or is ill-conditioned.
Pseudo-inverse. The Moore–Penrose pseudo-inverse $A^+$ is easiest via SVD.
Dimensionality reduction. Truncating to the top $k$ singular values gives the best rank-$k$ approximation (Eckart–Young theorem); this underlies PCA and low-rank compression.

Shapes to memorize (so you never get lost)

If $A$ is $m\times n$: • $U$ is $m\times m$ (orthogonal), • $\Sigma$ is $m\times n$ (diagonal "rectangle"), • $V$ is $n\times n$ (orthogonal).

For economy SVD (when $r=\operatorname{rank}(A)$): • $U$ becomes $m\times r$, • $\Sigma$ becomes $r\times r$, • $V$ becomes $n\times r$, and $A=U\Sigma V^\top$ still holds.

Reading $\Sigma$: singular values & rank

Singular values are the diagonal entries of - igma$: $\sigma_1,\dots,\sigma_r\ge 0$.
If some $\sigma_i=0$, directions in the input - ce are collapsed → $A$ is rank-deficient.
The condition number $\kappa(A)=\sigma_{\max}/\sigma_{\min}$ (for full-rank square $A$) predicts numerical sensitivity.

Pseudo-inverse via SVD (the concept tested)

Given $A=U\Sigma V^\top$, the Moore–Penrose pseudo-inverse is $$A^+ ;=; V,\Sigma^+,U^\top,$$ where $\Sigma^+$ is formed by reciprocating each nonzero singular value and transposing shapes:

If $\Sigma=\operatorname{diag}(\sigma_1,\dots,\sigma_r)$ in the economy form, then $$\Sigma^+=\operatorname{diag}(\tfrac{1}{\sigma_1},\dots,\tfrac{1}{\sigma_r}).$$
For a full $m\times n$ $\Sigma$, $\Sigma^+$ is $n\times m$ with those reciprocals placed on the corresponding diagonal positions; zeros stay zeros.

Why this matters.

Least squares solution: $x^* = A^+ b$ is the minimum-norm solution of $\min_x |Ax-b|_2$.
Generalizes inverse: If $A$ is square and invertible, $A^+=A^{-1}$.

Worked example (exactly like the exam’s SVD block)

Suppose you're given (from the paper) an explicit SVD of a $2\times 3$ matrix $A$: $$U=\begin{bmatrix}0&1\[2pt]1&0\end{bmatrix},\quad \Sigma=\begin{bmatrix}1.5&0&0\[2pt]0&0.5&0\end{bmatrix},\quad V=\begin{bmatrix} 0&1&0\[2pt] 1&0&0\[2pt] 0&0&1 \end{bmatrix}.$$

(a) Singular values

They are the diagonal entries of $\Sigma$: $$\boxed{1.5 \text{ and } 0.5.}$$

(b) Pseudo-inverse A^+ step-by-step

Reciprocate the nonzero singular values: $$\Sigma^+ ;=; \begin{bmatrix} \frac{1}{1.5}&0\[2pt] 0&\frac{1}{0.5}\[2pt] 0&0 \end{bmatrix} = \begin{bmatrix} \frac{2}{3}&0\[2pt] 0&2\[2pt] 0&0 \end{bmatrix} \quad(\text{shape }3\times 2).$$
Assemble $A^+ = V,\Sigma^+,U^\top$: $$A^+;=; \begin{bmatrix} 0&1&0\[2pt] 1&0&0\[2pt] 0&0&1 \end{bmatrix} \begin{bmatrix} \frac{2}{3}&0\[2pt] 0&2\[2pt] 0&0 \end{bmatrix} \begin{bmatrix} 0&1\[2pt] 1&0 \end{bmatrix} ;=; \boxed{ \begin{bmatrix} 2 & 0\[2pt] 0 & \tfrac{2}{3}\[2pt] 0 & 0 \end{bmatrix} }.$$

(Notice the final shape is $3\times 2$, the transpose of $A$'s $2\times 3$.)

Quick check: if you compute $A^+A$ you should get an orthogonal projector in $\mathbb{R}^3$ onto the row-space of $A$; and $AA^+$ a projector in $\mathbb{R}^2$ onto the column-space of $A$.

Another small, fully worked SVD by hand (2×2)

Consider $$A=\begin{bmatrix}3&0\[2pt]0&1\end{bmatrix}.$$ For a diagonal, symmetric, positive matrix like this:

The singular values are just the absolute diagonal entries: $\sigma_1=3,;\sigma_2=1$.
The singular vectors align with the standard basis, so $U=I_2,;V=I_2$.
SVD: $A=I\cdot\operatorname{diag}(3,1)\cdot I^\top$.
Pseudo-inverse: $A^+=\operatorname{diag}!\big(\tfrac{1}{3},1\big)$.

This trivial case helps build intuition: SVD reduces any matrix to a rotated version of such a simple diagonal scaling.

Low-rank approximation (core applied idea)

Write $$A=\sum_{i=1}^{r}\sigma_i,u_i v_i^\top,$$ where $u_i$ and $v_i$ are the $i$-th columns of $U$ and $V$. The best rank-$k$ approximation (in Frobenius or 2-norm) is $$A_k=\sum_{i=1}^{k}\sigma_i,u_i v_i^\top,$$ which keeps the top $k$ singular values/vectors and discards the rest. This is the mathematical reason PCA and image compression work: most of the "signal" often lives in a few top singular directions.

Mini example (rank-1 fit)

If $\sigma_1\gg \sigma_2\ge\cdots$, then $A\approx \sigma_1 u_1 v_1^\top$ is already a good approximation. This reduces storage from $mn$ to $m+n+1$ numbers.

Solving least squares with SVD (why it’s robust)

For $\min_x|Ax-b|_2$, the SVD solution is $$

x^* ;=; A^+ b ;=; V,\Sigma^+,U^\top b ;=;\sum_{i=1}^{r}\frac{u_i^\top b}{\sigma_i},v_i.

Ill-conditioning shows up as tiny $\sigma_i$ that blow up $1/\sigma_i$.
Tikhonov/ridge regularization effectively dampens small $\sigma_i$ contributions (replace $\frac{1}{\sigma_i}$ by $\frac{\sigma_i}{\sigma_i^2+\lambda}$).

Relation to eigen-decomposition (for intuition)

$A^\top A = V,\Sigma^\top \Sigma,V^\top = V,\operatorname{diag}(\sigma_1^2,\dots),V^\top$.
$A A^\top = U,\Sigma\Sigma^\top,U^\top = U,\operatorname{diag}(\sigma_1^2,\dots),U^\top$. So right singular vectors $v_i$ are eigenvectors of $A^\top A$; left singular vectors $u_i$ are eigenvectors of $A A^\top$; and the eigenvalues are the squared singular values.

Common pitfalls (and how to avoid them)

Mixing up shapes. Always sanity-check: $U$ is $m\times m$, $\Sigma$ is $m\times n$, $V$ is $n\times n$. In economy SVD, keep $r$ consistent.
Forgetting to reciprocate only nonzero $\sigma_i$. Zero singular values stay zero in $\Sigma^+$; never invert them.
Assuming $A^+=A^{-1}$. Only true if $A$ is square and full-rank.
Numerical stability. When $\sigma_{\min}$ is tiny, plain normal equations can be unstable; SVD-based solvers are safer.

Quick practice (with answers)

Identify singular values. $$\Sigma=\begin{bmatrix}4&0&0\0&1&0\end{bmatrix}.$$ Ans: 4 and 1.
Build a pseudo-inverse. With the $\Sigma$ above, $\Sigma^+=\begin{bmatrix}1/4&0\0&1\0&0\end{bmatrix}$. Then $A^+=V,\Sigma^+,U^\top$.
Rank & approximation. If $A$ has singular values [10, 0.8, 0.7, 0.05], the best rank-1 approximation keeps only $\sigma_1=10$ and the associated $u_1,v_1$.

Tiny “exam-style” recap (maps to what’s asked)

"What are the singular values?" → read the diagonal of $\Sigma$.
"Compute $A^+$ from SVD." → $A^+=V,\Sigma^+,U^\top$ with reciprocals of nonzero singular values and correct shapes.
"Why SVD?" → stable least squares, pseudo-inverse, low-rank structure.

One-page cheat sheet

Factorization: $A=U\Sigma V^\top$.
Singular values: nonnegative, descending; $#{>0}=\operatorname{rank}(A)$.
Pseudo-inverse: $A^+=V\Sigma^+U^\top$ with $\Sigma^+$ reciprocals of nonzeros.
Least squares: $x^*=A^+b=\sum_i (u_i^\top b/\sigma_i),v_i$.
Best rank-$k$: keep top $k$ $\sigma_i,u_i,v_i$.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A Gentle, Practical Tutorial on SVD

SVD in Plain Words

Why SVD matters

Shapes to memorize (so you never get lost)

Reading $\Sigma$: singular values & rank

Pseudo-inverse via SVD (the concept tested)

Why this matters.

Worked example (exactly like the exam’s SVD block)

Relation to eigen-decomposition (for intuition)

Common pitfalls (and how to avoid them)

Quick practice (with answers)

Tiny “exam-style” recap (maps to what’s asked)

One-page cheat sheet

FilesExpand file tree

svd.md

Latest commit

History

svd.md

File metadata and controls

A Gentle, Practical Tutorial on SVD

SVD in Plain Words

Why SVD matters

Shapes to memorize (so you never get lost)

Reading $\Sigma$: singular values & rank

Pseudo-inverse via SVD (the concept tested)

Why this matters.

Worked example (exactly like the exam’s SVD block)

Relation to eigen-decomposition (for intuition)

Common pitfalls (and how to avoid them)

Quick practice (with answers)

Tiny “exam-style” recap (maps to what’s asked)

One-page cheat sheet