Eigendecomposition in AI

PCA showed us how eigenvectors can compress data. But eigenvectors power far more than dimensionality reduction. In this lesson, we explore eigendecomposition and its more general cousin Singular Value Decomposition (SVD), and then see how they drive some of the most influential AI systems ever created -- from Netflix's recommendation engine to Google's PageRank algorithm.

Eigendecomposition: Breaking a Matrix Apart

Any square matrix with a full set of independent eigenvectors can be decomposed into three simpler matrices:

A = P * D * P⁻¹

P is a matrix whose columns are the eigenvectors of A
D is a diagonal matrix with the corresponding eigenvalues on its diagonal
P⁻¹ is the inverse of P

This factorization is called eigendecomposition. It says that the complex transformation performed by A is equivalent to three simple steps: (1) rotate into the eigenvector coordinate system, (2) scale each axis by its eigenvalue, (3) rotate back.

A = | 2  1 |   =   P * | 3  0 | * P⁻¹
    | 1  2 |           | 0  1 |

The diagonal matrix D reveals the essence of what A does -- pure scaling along two directions, with no mixing or rotation.

The Limitation: Only Square Matrices

Eigendecomposition requires a square matrix. But most real-world data matrices are not square. A dataset with 10,000 samples and 500 features is a 10,000 x 500 matrix. We need a more general tool.

Singular Value Decomposition (SVD)

SVD generalizes eigendecomposition to work on any matrix, regardless of shape:

A = U * Σ * V^T

Component	Shape	Meaning
U	m x m	Left singular vectors (patterns in rows)
Σ (Sigma)	m x n	Diagonal matrix of singular values
V^T	n x n	Right singular vectors (patterns in columns)

The singular values on the diagonal of Sigma are always non-negative and ordered from largest to smallest. They play the same role as eigenvalues -- measuring the importance of each component. In fact, for symmetric matrices, the singular values are the absolute values of the eigenvalues.

Truncated SVD: Keeping Only What Matters

The power of SVD comes from truncation. Instead of keeping all components, you keep only the top k singular values and their corresponding vectors:

A ≈ U_k * Σ_k * V_k^T

If A is a 10,000 x 500 matrix and you keep k = 20 components, you go from storing 5,000,000 numbers to roughly 210,000 -- a 24x compression -- while preserving the most important structure in the data.

SVD in Recommendation Systems

The Netflix Prize (2006-2009) offered one million dollars to anyone who could improve Netflix's movie recommendations by 10%. The winning approaches relied heavily on SVD.

The idea: build a matrix where rows are users, columns are movies, and entries are ratings. Most entries are missing (users have not rated most movies). SVD decomposes this sparse matrix into latent factors:

U captures user preferences (e.g., "likes action," "prefers short films")
V captures movie characteristics (e.g., "is action-heavy," "has strong dialogue")
Sigma weights these factors by importance

To predict whether user i will like movie j, compute the dot product of their latent vectors. This is collaborative filtering -- the foundation of modern recommendation systems at Netflix, Spotify, Amazon, and YouTube.

SVD in Image Compression

A grayscale image is a matrix of pixel intensities. Applying SVD and keeping only the top k singular values produces a compressed approximation:

Singular Values Kept	Storage	Image Quality
k = 5	~2% of original	Blurry, major shapes only
k = 20	~8% of original	Recognizable, some detail lost
k = 50	~20% of original	Good quality, minor artifacts
k = 200	~80% of original	Nearly indistinguishable from original

The largest singular values capture broad shapes and contrast. Smaller ones capture fine detail and noise. By discarding the smallest, you keep the image's essence at a fraction of the storage cost.

Latent Semantic Analysis (LSA)

In natural language processing, Latent Semantic Analysis applies SVD to a document-term matrix where rows are documents and columns are words. The entry at position (i, j) is how often word j appears in document i.

SVD reveals hidden semantic relationships:

Words that appear in similar contexts (like "car" and "automobile") end up with similar vectors
Documents about similar topics cluster together in the reduced space
The latent dimensions correspond to abstract topics or concepts

LSA was one of the earliest successful techniques for making computers understand that words can be related even when they never appear together in the same sentence.

Spectral Clustering

Traditional clustering algorithms like K-means struggle with data that forms non-spherical clusters (rings, crescents, intertwined spirals). Spectral clustering solves this by:

Building a similarity graph where edges connect nearby data points
Computing the graph Laplacian matrix from this graph
Finding the eigenvectors of the Laplacian with the smallest eigenvalues
Clustering the data in this eigenvector space using standard K-means

The eigenvectors of the graph Laplacian reveal the connected components and natural groupings in the data, even when those groups have complex shapes that defeat simpler algorithms.

Google PageRank

Google's original search algorithm, PageRank, is fundamentally an eigenvector problem. The web is modeled as a massive matrix where entry (i, j) represents the probability of following a link from page j to page i.

PageRank computes the dominant eigenvector of this matrix -- the eigenvector with the largest eigenvalue. Each component of this eigenvector is a page's importance score. Pages that are linked to by many important pages receive high scores.

PageRank vector = dominant eigenvector of the web link matrix

This single insight -- that the most important information about a network is encoded in its dominant eigenvector -- made Google the most successful search engine in history.

Connecting It All Together

Throughout this course, we have built a complete picture:

Vectors represent data (Module 1)
Matrices transform and organize data (Module 2)
Matrix multiplication processes data through neural network layers (Module 3)
Dot products measure similarity between data points (Module 4)
Eigenvectors and eigenvalues reveal the most important patterns and compress data (Module 5)

These concepts do not exist in isolation. A recommendation system uses vectors (user preferences), matrices (ratings data), matrix multiplication (predictions), similarity (finding related users), and eigendecomposition (extracting latent factors) -- all working together in a single pipeline.

Summary

Eigendecomposition factors a square matrix as A = PDP⁻¹, separating it into eigenvector directions and eigenvalue scales
SVD generalizes this to any matrix: A = U Sigma V^T, making it applicable to real-world datasets
Truncated SVD compresses data by keeping only the top k components
Recommendation systems use SVD to discover latent user preferences and item characteristics
Image compression uses SVD to store images at a fraction of their original size
Latent Semantic Analysis applies SVD to document-term matrices to uncover hidden word relationships
Spectral clustering uses eigenvectors of graph Laplacians to find complex-shaped clusters
Google PageRank computes the dominant eigenvector of the web link matrix to rank pages

In Module 6, we step beyond matrices into tensors -- the multi-dimensional arrays that power modern deep learning frameworks like PyTorch and TensorFlow -- and see how all the linear algebra concepts from this course come together throughout the AI pipeline.