Singular Value Decomposition (SVD) factors any real matrix $ A\in\mathbb{R}^{m\times n} $ into three parts
- $ U\in\mathbb{R}^{m\times m} $ has orthonormal columns (think: a rotation- flection in the output space),
- $ V\in\mathbb{R}^{n\times n} $ has orthonormal columns (a rotation- flection in the input space),
- $ \Sigma\in\mathbb{R}^{m\times n} $ is diagonal (zeros off the diagonal)- th non-negative numbers $\sigma_1\ge \sigma_2\ge\cdots\ge 0 $ called the singular values.
Geometric picture. SVD says: to apply A to a vector,
- rotate (by
$V^\top$ ), - stretch/compress along perpendicular axes (by
$\sigma $ ), - rotate again (by
$U$ ).
- Understanding rank & energy. The number of nonzero singular values equals $ \operatorname{rank}(A)$. Larger singular values capture more “energy/variance”.
- Stable least squares. SVD yields robust solutions
$\min_x|Ax-b|_2$ , even when$A$ is not square or is ill-conditioned. - Pseudo-inverse. The Moore–Penrose pseudo-inverse
$A^+$ is easiest via SVD. - Dimensionality reduction. Truncating to the top
$k$ singular values gives the best rank-$k$ approximation (Eckart–Young theorem); this underlies PCA and low-rank compression.
If
For economy SVD (when $r=\operatorname{rank}(A)$):
•
- Singular values are the diagonal entries of - igma$:
$\sigma_1,\dots,\sigma_r\ge 0$ . - If some
$\sigma_i=0$ , directions in the input - ce are collapsed →$A$ is rank-deficient. - The condition number
$\kappa(A)=\sigma_{\max}/\sigma_{\min}$ (for full-rank square$A$ ) predicts numerical sensitivity.
Given
- If
$\Sigma=\operatorname{diag}(\sigma_1,\dots,\sigma_r)$ in the economy form, then$$\Sigma^+=\operatorname{diag}(\tfrac{1}{\sigma_1},\dots,\tfrac{1}{\sigma_r}).$$ - For a full
$m\times n$ $\Sigma$ ,$\Sigma^+$ is$n\times m$ with those reciprocals placed on the corresponding diagonal positions; zeros stay zeros.
- Least squares solution:
$x^* = A^+ b$ is the minimum-norm solution of$\min_x |Ax-b|_2$ . - Generalizes inverse: If
$A$ is square and invertible,$A^+=A^{-1}$ .
Suppose you're given (from the paper) an explicit SVD of a
(a) Singular values
They are the diagonal entries of
(b) Pseudo-inverse A^+ step-by-step
-
Reciprocate the nonzero singular values: $$\Sigma^+ ;=; \begin{bmatrix} \frac{1}{1.5}&0\[2pt] 0&\frac{1}{0.5}\[2pt] 0&0 \end{bmatrix} = \begin{bmatrix} \frac{2}{3}&0\[2pt] 0&2\[2pt] 0&0 \end{bmatrix} \quad(\text{shape }3\times 2).$$
-
Assemble
$A^+ = V,\Sigma^+,U^\top$ : $$A^+;=; \begin{bmatrix} 0&1&0\[2pt] 1&0&0\[2pt] 0&0&1 \end{bmatrix} \begin{bmatrix} \frac{2}{3}&0\[2pt] 0&2\[2pt] 0&0 \end{bmatrix} \begin{bmatrix} 0&1\[2pt] 1&0 \end{bmatrix} ;=; \boxed{ \begin{bmatrix} 2 & 0\[2pt] 0 & \tfrac{2}{3}\[2pt] 0 & 0 \end{bmatrix} }.$$
(Notice the final shape is
Quick check: if you compute
- Another small, fully worked SVD by hand (2×2)
Consider $$A=\begin{bmatrix}3&0\[2pt]0&1\end{bmatrix}.$$ For a diagonal, symmetric, positive matrix like this:
- The singular values are just the absolute diagonal entries:
$\sigma_1=3,;\sigma_2=1$ . - The singular vectors align with the standard basis, so
$U=I_2,;V=I_2$ . - SVD:
$A=I\cdot\operatorname{diag}(3,1)\cdot I^\top$ . - Pseudo-inverse:
$A^+=\operatorname{diag}!\big(\tfrac{1}{3},1\big)$ .
This trivial case helps build intuition: SVD reduces any matrix to a rotated version of such a simple diagonal scaling.
- Low-rank approximation (core applied idea)
Write
Mini example (rank-1 fit)
If
- Solving least squares with SVD (why it’s robust)
For
x^* ;=; A^+ b ;=; V,\Sigma^+,U^\top b ;=;\sum_{i=1}^{r}\frac{u_i^\top b}{\sigma_i},v_i.
$$
- Ill-conditioning shows up as tiny
$\sigma_i$ that blow up$1/\sigma_i$ . - Tikhonov/ridge regularization effectively dampens small
$\sigma_i$ contributions (replace$\frac{1}{\sigma_i}$ by$\frac{\sigma_i}{\sigma_i^2+\lambda}$ ).
-
$A^\top A = V,\Sigma^\top \Sigma,V^\top = V,\operatorname{diag}(\sigma_1^2,\dots),V^\top$ . -
$A A^\top = U,\Sigma\Sigma^\top,U^\top = U,\operatorname{diag}(\sigma_1^2,\dots),U^\top$ . So right singular vectors$v_i$ are eigenvectors of$A^\top A$ ; left singular vectors$u_i$ are eigenvectors of$A A^\top$ ; and the eigenvalues are the squared singular values.
- Mixing up shapes. Always sanity-check:
$U$ is$m\times m$ ,$\Sigma$ is$m\times n$ ,$V$ is$n\times n$ . In economy SVD, keep$r$ consistent. - Forgetting to reciprocate only nonzero
$\sigma_i$ . Zero singular values stay zero in$\Sigma^+$ ; never invert them. - Assuming
$A^+=A^{-1}$ . Only true if$A$ is square and full-rank. - Numerical stability. When
$\sigma_{\min}$ is tiny, plain normal equations can be unstable; SVD-based solvers are safer.
- Identify singular values. $$\Sigma=\begin{bmatrix}4&0&0\0&1&0\end{bmatrix}.$$ Ans: 4 and 1.
- Build a pseudo-inverse.
With the
$\Sigma$ above, $\Sigma^+=\begin{bmatrix}1/4&0\0&1\0&0\end{bmatrix}$. Then$A^+=V,\Sigma^+,U^\top$ . - Rank & approximation.
If
$A$ has singular values [10, 0.8, 0.7, 0.05], the best rank-1 approximation keeps only$\sigma_1=10$ and the associated$u_1,v_1$ .
- "What are the singular values?" → read the diagonal of
$\Sigma$ . - "Compute
$A^+$ from SVD." →$A^+=V,\Sigma^+,U^\top$ with reciprocals of nonzero singular values and correct shapes. - "Why SVD?" → stable least squares, pseudo-inverse, low-rank structure.
- Factorization:
$A=U\Sigma V^\top$ . - Singular values: nonnegative, descending;
$#{>0}=\operatorname{rank}(A)$ . - Pseudo-inverse:
$A^+=V\Sigma^+U^\top$ with$\Sigma^+$ reciprocals of nonzeros. - Least squares:
$x^*=A^+b=\sum_i (u_i^\top b/\sigma_i),v_i$ . - Best rank-$k$: keep top
$k$ $\sigma_i,u_i,v_i$ .