Matrix Operations

Now that you know what matrices are, it is time to learn how to work with them. Matrix operations are the fundamental actions that neural networks perform millions of times during training. Every time a model learns from data, it uses matrix addition to update its weights and scalar multiplication to control the learning rate.

Matrix Addition and Subtraction

To add two matrices, simply add the corresponding elements. Both matrices must have the same shape -- you cannot add a 2x3 matrix to a 3x2 matrix.

A = | 1  2  3 |    B = | 10  20  30 |
    | 4  5  6 |        | 40  50  60 |

A + B = | 1+10  2+20  3+30 |   = | 11  22  33 |
        | 4+40  5+50  6+60 |     | 44  55  66 |

Subtraction works the same way, element by element:

A - B = | 1-10  2-20  3-30 |   = | -9  -18  -27 |
        | 4-40  5-50  6-60 |     | -36  -45  -54 |

Key rule: Both matrices must have identical dimensions. A 2x3 matrix can only be added to another 2x3 matrix.

Scalar Multiplication

A scalar is just a single number. Scalar multiplication means multiplying every element in a matrix by that number.

A = | 1  2  3 |
    | 4  5  6 |

3 * A = | 3   6   9 |
        | 12  15  18 |

This operation scales every value up or down uniformly. It is one of the simplest but most important operations in AI.

AI Context: Gradient Updates

During training, a neural network updates its weights using this formula:

W_new = W_old - learning_rate * gradients

This single line uses both scalar multiplication and matrix subtraction:

Scalar multiplication: learning_rate * gradients -- the learning rate (a small number like 0.01) scales the gradient matrix, controlling how big each update step is
Matrix subtraction: W_old - (scaled gradients) -- the scaled gradients are subtracted from the current weights

Here is a concrete example with a 2x2 weight matrix:

W_old = | 0.5   0.3 |     gradients = | 0.1  -0.2 |     learning_rate = 0.1
        | -0.1  0.8 |                 | 0.3   0.0 |

Step 1: learning_rate * gradients = | 0.01  -0.02 |
                                    | 0.03   0.00 |

Step 2: W_new = | 0.5 - 0.01    0.3 - (-0.02) |   = | 0.49  0.32 |
                | -0.1 - 0.03   0.8 - 0.00    |     | -0.13  0.80 |

This is exactly how every neural network in the world learns. The numbers change, but the operation stays the same.

Element-wise Multiplication (Hadamard Product)

The Hadamard product multiplies corresponding elements of two matrices, just like addition works element by element. It is denoted with the symbol A * B or A ⊙ B.

A = | 1  2  3 |    B = | 2  0  1 |
    | 4  5  6 |        | 3  1  2 |

A * B = | 1*2  2*0  3*1 |   = | 2   0   3 |
        | 4*3  5*1  6*2 |     | 12  5  12 |

Like addition, both matrices must have the same shape. The Hadamard product is used in gating mechanisms inside LSTMs and transformers, where the network decides element by element how much information to keep or discard.

Matrix Transpose Revisited

We introduced the transpose in the previous lesson. Here we will note its key properties that matter for operations:

A = | 1  2  3 |      A^T = | 1  4 |
    | 4  5  6 |            | 2  5 |
                            | 3  6 |

Important properties of the transpose:

(A^T)^T = A -- transposing twice returns the original matrix
(A + B)^T = A^T + B^T -- the transpose distributes over addition
(cA)^T = cA^T -- scalars pass through the transpose

The transpose is used heavily during backpropagation, where gradients flow backwards through the network by multiplying with transposed weight matrices.

Properties of Matrix Operations

Understanding these properties helps you reason about how AI computations work:

Property	Rule	Meaning
Commutative addition	A + B = B + A	Order does not matter for addition
Associative addition	(A + B) + C = A + (B + C)	Grouping does not matter for addition
Distributive scalar	c(A + B) = cA + cB	Scalar multiplication distributes over addition
Additive identity	A + 0 = A	Adding the zero matrix changes nothing

Important preview: Matrix multiplication (covered in Module 3) is not commutative. That means A x B does not generally equal B x A. This is a critical difference from regular number multiplication and has deep implications for how neural networks process data.

Putting It All Together

Here is a sequence of operations you might see in a single training step:

Given:
  W = | 0.5   0.3 |   (current weights)
  G = | 0.2  -0.1 |   (gradients)
  lr = 0.01            (learning rate)

1. Scale gradients:        lr * G = | 0.002  -0.001 |
2. Update weights:    W - lr * G = | 0.498   0.301 |

These tiny adjustments, repeated billions of times across millions of matrix elements, are how AI models go from random guesses to useful predictions.

Summary

Matrix addition/subtraction works element by element and requires matching dimensions
Scalar multiplication multiplies every element by a single number
The Hadamard product multiplies corresponding elements of two same-shaped matrices
The transpose swaps rows and columns and is essential in backpropagation
Matrix addition is commutative, but matrix multiplication is not (more in Module 3)
Gradient updates combine scalar multiplication and matrix subtraction to train neural networks

In the next lesson, we will see how these operations come together inside a neural network layer, where matrices transform input data into predictions.