What Are Matrices?

In Module 1, we learned that vectors are the building blocks of data in AI. Now it is time to level up. A matrix is how AI systems organize and process many vectors at once. Every neural network you have ever heard of stores its learned knowledge in matrices. Understanding them is essential to understanding how AI thinks.

From Vectors to Matrices

A matrix is a rectangular grid of numbers arranged in rows and columns. While a vector is a single list of numbers, a matrix is a collection of vectors stacked together.

We describe the size of a matrix as rows x columns. A matrix with 3 rows and 4 columns is called a 3x4 matrix (read "three by four").

A = | 1  2  3  4 |
    | 5  6  7  8 |
    | 9 10 11 12 |

This is a 3x4 matrix. It has 3 rows and 4 columns, containing 12 numbers total.

Matrix Notation

We typically use uppercase bold letters like A, B, and W for matrices. To refer to a specific element, we write A_ij, which means the element in row i and column j.

For the matrix above:

A_11 = 1 (row 1, column 1)
A_23 = 7 (row 2, column 3)
A_34 = 12 (row 3, column 4)

This notation is used everywhere in AI research and documentation, so getting comfortable with it now will pay off throughout the course.

Matrices as Collections of Vectors

You can think of a matrix in two ways:

Row vectors: Each row is a separate vector. A 3x4 matrix contains 3 row vectors, each with 4 elements.
Column vectors: Each column is a separate vector. A 3x4 matrix contains 4 column vectors, each with 3 elements.

This dual perspective is fundamental. In AI, a dataset is often a matrix where each row is a data sample and each column is a feature.

AI Context: A Dataset as a Matrix

Imagine you have data for 3 students with 4 features: study hours, sleep hours, previous grade, and attendance percentage.

Student Data = | 5.0  7.5  82  90 |
               | 3.0  6.0  71  75 |
               | 7.0  8.0  95  98 |

This is a 3x4 matrix where:

Row	Student	Study Hours	Sleep Hours	Prev Grade	Attendance
1	Alice	5.0	7.5	82	90
2	Bob	3.0	6.0	71	75
3	Carol	7.0	8.0	95	98

When you feed a batch of data into a neural network, you are feeding it a matrix exactly like this. Each row is one sample, and the network processes all of them simultaneously.

Types of Matrices

Several special types of matrices appear constantly in AI and linear algebra.

Square matrix -- same number of rows and columns (e.g., 3x3). These show up in transformations and attention mechanisms.

Identity matrix (I) -- a square matrix with 1s on the diagonal and 0s everywhere else. Multiplying any matrix by the identity matrix returns the original matrix, just like multiplying a number by 1.

I = | 1  0  0 |
    | 0  1  0 |
    | 0  0  1 |

Zero matrix -- every element is 0. Used to initialize values before training.

Diagonal matrix -- only the diagonal entries are non-zero. Used in scaling operations and eigenvalue decomposition.

D = | 3  0  0 |
    | 0  7  0 |
    | 0  0  2 |

The Transpose Operation

The transpose of a matrix flips it along its diagonal, turning rows into columns and columns into rows. The transpose of A is written as A^T.

A = | 1  2  3 |      A^T = | 1  4 |
    | 4  5  6 |            | 2  5 |
                            | 3  6 |

A was 2x3, and A^T is 3x2. The transpose is used extensively in neural network calculations, especially during backpropagation.

AI Context: Weight Matrices

In a neural network, the connections between layers are stored as weight matrices. A weight matrix W with shape 4x3 means:

4 output neurons
3 input neurons
12 total learnable parameters (connections)

Every time a neural network learns, it adjusts the numbers in these weight matrices. A modern large language model may have billions of these individual weights organized across thousands of matrices.

Why Matrices Are Everywhere in AI

Data batches: Training data is organized as matrices for efficient processing
Weight storage: Neural networks store all learned parameters as matrices
Transformations: Moving data from one representation to another is a matrix operation
GPU acceleration: GPUs are specifically designed to process matrix operations in parallel, which is why they are essential for AI

Summary

A matrix is a 2D grid of numbers with a defined number of rows and columns
A_ij refers to the element at row i, column j
Matrices can be viewed as collections of row vectors or column vectors
Special types include identity, zero, diagonal, and square matrices
The transpose flips rows and columns
In AI, matrices represent datasets, weight parameters, and transformations

Next, we will learn how to perform operations on matrices -- adding, scaling, and combining them -- which is how neural networks update their knowledge during training.