Matrix Operations
Now that you know what matrices are, it is time to learn how to work with them. Matrix operations are the fundamental actions that neural networks perform millions of times during training. Every time a model learns from data, it uses matrix addition to update its weights and scalar multiplication to control the learning rate.
Matrix Addition and Subtraction
To add two matrices, simply add the corresponding elements. Both matrices must have the same shape -- you cannot add a 2x3 matrix to a 3x2 matrix.
A = | 1 2 3 | B = | 10 20 30 |
| 4 5 6 | | 40 50 60 |
A + B = | 1+10 2+20 3+30 | = | 11 22 33 |
| 4+40 5+50 6+60 | | 44 55 66 |
Subtraction works the same way, element by element:
A - B = | 1-10 2-20 3-30 | = | -9 -18 -27 |
| 4-40 5-50 6-60 | | -36 -45 -54 |
Key rule: Both matrices must have identical dimensions. A 2x3 matrix can only be added to another 2x3 matrix.
Scalar Multiplication
A scalar is just a single number. Scalar multiplication means multiplying every element in a matrix by that number.
A = | 1 2 3 |
| 4 5 6 |
3 * A = | 3 6 9 |
| 12 15 18 |
This operation scales every value up or down uniformly. It is one of the simplest but most important operations in AI.
AI Context: Gradient Updates
During training, a neural network updates its weights using this formula:
W_new = W_old - learning_rate * gradients
This single line uses both scalar multiplication and matrix subtraction:
- Scalar multiplication:
learning_rate * gradients-- the learning rate (a small number like 0.01) scales the gradient matrix, controlling how big each update step is - Matrix subtraction:
W_old - (scaled gradients)-- the scaled gradients are subtracted from the current weights
Here is a concrete example with a 2x2 weight matrix:
W_old = | 0.5 0.3 | gradients = | 0.1 -0.2 | learning_rate = 0.1
| -0.1 0.8 | | 0.3 0.0 |
Step 1: learning_rate * gradients = | 0.01 -0.02 |
| 0.03 0.00 |
Step 2: W_new = | 0.5 - 0.01 0.3 - (-0.02) | = | 0.49 0.32 |
| -0.1 - 0.03 0.8 - 0.00 | | -0.13 0.80 |
This is exactly how every neural network in the world learns. The numbers change, but the operation stays the same.
Element-wise Multiplication (Hadamard Product)
The Hadamard product multiplies corresponding elements of two matrices, just like addition works element by element. It is denoted with the symbol A * B or A ⊙ B.
A = | 1 2 3 | B = | 2 0 1 |
| 4 5 6 | | 3 1 2 |
A * B = | 1*2 2*0 3*1 | = | 2 0 3 |
| 4*3 5*1 6*2 | | 12 5 12 |
Like addition, both matrices must have the same shape. The Hadamard product is used in gating mechanisms inside LSTMs and transformers, where the network decides element by element how much information to keep or discard.
Matrix Transpose Revisited
We introduced the transpose in the previous lesson. Here we will note its key properties that matter for operations:
A = | 1 2 3 | A^T = | 1 4 |
| 4 5 6 | | 2 5 |
| 3 6 |
Important properties of the transpose:
- (A^T)^T = A -- transposing twice returns the original matrix
- (A + B)^T = A^T + B^T -- the transpose distributes over addition
- (cA)^T = cA^T -- scalars pass through the transpose
The transpose is used heavily during backpropagation, where gradients flow backwards through the network by multiplying with transposed weight matrices.
Properties of Matrix Operations
Understanding these properties helps you reason about how AI computations work:
| Property | Rule | Meaning |
|---|---|---|
| Commutative addition | A + B = B + A | Order does not matter for addition |
| Associative addition | (A + B) + C = A + (B + C) | Grouping does not matter for addition |
| Distributive scalar | c(A + B) = cA + cB | Scalar multiplication distributes over addition |
| Additive identity | A + 0 = A | Adding the zero matrix changes nothing |
Important preview: Matrix multiplication (covered in Module 3) is not commutative. That means A x B does not generally equal B x A. This is a critical difference from regular number multiplication and has deep implications for how neural networks process data.
Putting It All Together
Here is a sequence of operations you might see in a single training step:
Given:
W = | 0.5 0.3 | (current weights)
G = | 0.2 -0.1 | (gradients)
lr = 0.01 (learning rate)
1. Scale gradients: lr * G = | 0.002 -0.001 |
2. Update weights: W - lr * G = | 0.498 0.301 |
These tiny adjustments, repeated billions of times across millions of matrix elements, are how AI models go from random guesses to useful predictions.
Summary
- Matrix addition/subtraction works element by element and requires matching dimensions
- Scalar multiplication multiplies every element by a single number
- The Hadamard product multiplies corresponding elements of two same-shaped matrices
- The transpose swaps rows and columns and is essential in backpropagation
- Matrix addition is commutative, but matrix multiplication is not (more in Module 3)
- Gradient updates combine scalar multiplication and matrix subtraction to train neural networks
In the next lesson, we will see how these operations come together inside a neural network layer, where matrices transform input data into predictions.

