Matrix Multiplication Explained
Every time an AI model processes your input, it performs matrix multiplication -- thousands or even millions of times. This single operation is the computational backbone of modern artificial intelligence. In this lesson, you will learn exactly how it works and why it was designed this way.
Why Not Just Multiply Element by Element?
You might wonder: why not multiply two matrices the same way we add them, element by element? The answer is that element-wise multiplication does not capture the relationships between rows and columns. Matrix multiplication is specifically designed to compose transformations -- to chain operations together so that one transformation feeds into the next.
In AI, this means data flows through layers of a neural network, with each layer transforming the output of the previous one. Element-wise multiplication cannot do this. Matrix multiplication can.
The Dimension Rule
Before you can multiply two matrices, their dimensions must be compatible:
Matrix A has shape (m x n) and Matrix B has shape (n x p). The result is shape (m x p).
The critical rule: the inner dimensions must match. The number of columns in A must equal the number of rows in B.
| A shape | B shape | Compatible? | Result shape |
|---|---|---|---|
| 2 x 3 | 3 x 2 | Yes (3 = 3) | 2 x 2 |
| 3 x 4 | 4 x 1 | Yes (4 = 4) | 3 x 1 |
| 2 x 3 | 2 x 3 | No (3 != 2) | Undefined |
| 1 x 5 | 5 x 5 | Yes (5 = 5) | 1 x 5 |
Step-by-Step Example: (2x3) Times (3x2)
Let us multiply matrix A (2x3) by matrix B (3x2):
A = | 1 2 3 | B = | 7 8 |
| 4 5 6 | | 9 10 |
| 11 12 |
Each element of the result is the dot product of a row from A and a column from B.
Result[0][0] = Row 0 of A dot Column 0 of B:
(1 x 7) + (2 x 9) + (3 x 11) = 7 + 18 + 33 = 58
Result[0][1] = Row 0 of A dot Column 1 of B:
(1 x 8) + (2 x 10) + (3 x 12) = 8 + 20 + 36 = 64
Result[1][0] = Row 1 of A dot Column 0 of B:
(4 x 7) + (5 x 9) + (6 x 11) = 28 + 45 + 66 = 139
Result[1][1] = Row 1 of A dot Column 1 of B:
(4 x 8) + (5 x 10) + (6 x 12) = 32 + 50 + 72 = 154
The final result is a 2x2 matrix:
A x B = | 58 64 |
| 139 154 |
Properties of Matrix Multiplication
Matrix multiplication has some important properties -- and one critical difference from regular number multiplication:
- Associative: (AB)C = A(BC) -- you can regroup, which lets AI frameworks optimize computation order
- Distributive: A(B + C) = AB + AC -- distributing works as expected
- NOT commutative: AB does not generally equal BA -- order matters
The non-commutativity is not a limitation; it reflects reality. Rotating an image and then scaling it produces a different result than scaling and then rotating. The order of transformations matters, and matrix multiplication captures this.
Matrix-Vector Multiplication: A Special Case
When one of the matrices is a column vector (shape n x 1), matrix multiplication becomes matrix-vector multiplication. This is the fundamental operation in a neural network layer:
W = | 0.5 0.3 | x = | 2 |
| 0.8 0.1 | | 4 |
W x x = | (0.5)(2) + (0.3)(4) | = | 2.2 |
| (0.8)(2) + (0.1)(4) | | 2.0 |
Here, W represents learned weights and x represents input data. Each output value is a weighted combination of all input values. This is exactly what happens inside every dense layer of a neural network.
Why This Operation Powers AI
Matrix multiplication is the engine of AI for a fundamental reason: it represents the composition of linear transformations. Each layer of a neural network applies a transformation (matrix multiplication) followed by a non-linear activation function. Stacking many of these layers allows the network to learn extraordinarily complex patterns.
When you hear that a model has "175 billion parameters," those parameters are organized into matrices. Processing a single input means multiplying through dozens or hundreds of these matrices in sequence.
Summary
- Matrix multiplication is not element-wise -- each result element is a dot product of a row and a column
- The inner dimensions must match: (m x n) times (n x p) gives (m x p)
- The operation is associative and distributive but not commutative -- order matters
- Matrix-vector multiplication is the core operation in neural network layers
- Matrix multiplication represents the composition of transformations, which is why it is the fundamental operation in AI
Now that you understand how matrix multiplication works mechanically, let us see it in action inside one of the most important AI architectures ever created: the transformer.

