Vector Operations

Vectors only become useful when you can manipulate them. In this lesson, you will learn the fundamental operations that AI systems perform on vectors millions of times per second: addition, scaling, subtraction, and measuring magnitude. These operations are the building blocks of neural network training and feature engineering.

Vector Addition: Combining Information

Vector addition combines two vectors by adding their corresponding components.

a = [1, 3, 5]
b = [2, 4, 6]

a + b = [1+2, 3+4, 5+6] = [3, 7, 11]

Geometrically, this is "tip-to-tail" placement: walk along a, then along b. The result is where you finish. In AI, addition appears in residual connections (adding a layer's input to its output), bias terms, and feature fusion.

Scalar Multiplication: Scaling Vectors

A scalar is a single number. Scalar multiplication multiplies every component by that number.

v = [2, 4, 6]
3 × v = [6, 12, 18]
0.5 × v = [1, 2, 3]
-1 × v = [-2, -4, -6]

Scaling by a value greater than 1 stretches the vector. A value between 0 and 1 shrinks it. A negative value reverses its direction. In AI, the learning rate scales the gradient (weights = weights - 0.01 * gradient), and attention scores scale value vectors by importance.

Vector Subtraction: Measuring Differences

Vector subtraction subtracts corresponding components, producing a vector that points from b to a.

a = [5, 8, 3]
b = [2, 3, 1]
a - b = [3, 5, 2]

In AI, subtraction is essential for error calculation (error = predicted - actual), gradient computation, and vector analogies like "king - man + woman = queen" (explored in the next lesson).

Vector Norms: Measuring Magnitude

The norm of a vector measures its length. Two norms are especially important.

L2 Norm (Euclidean) -- straight-line distance from the origin:

v = [3, 4]
||v||₂ = √(3² + 4²) = √25 = 5

L1 Norm (Manhattan) -- sum of absolute values:

v = [3, -4]
||v||₁ = |3| + |-4| = 7

Norm	Formula	Value for [3, -4]	Intuition
L1	sum of absolute values	7	Grid-walking distance
L2	square root of sum of squares	5	Straight-line distance

Unit Vectors and Normalization

A unit vector has a norm of exactly 1. Normalization divides a vector by its norm, preserving direction while setting magnitude to 1.

v = [3, 4]        ||v||₂ = 5
unit = v / 5 = [0.6, 0.8]

Verify: sqrt(0.6² + 0.8²) = sqrt(1.0) = 1.

Why Normalization Matters in ML

Without normalization, features on different scales cause serious problems. Consider age (0-100) vs. salary (0-1,000,000). Salary dominates every calculation simply because its numbers are larger.

Technique	Formula	Result Range
Min-Max	(x - min) / (max - min)	[0, 1]
L2 Normalization	x / \|\|x\|\|₂	Unit vector
Z-Score	(x - mean) / std_dev	Centered at 0

Normalization helps in three key ways:

Gradient descent converges faster because the loss surface becomes more uniform
Distance metrics work properly when vectors are on comparable scales
Training stays stable because large values cannot cause exploding gradients

Modern transformers include Layer Normalization as a built-in operation, normalizing vectors at every layer.

Summary

Vector addition combines vectors component-wise and powers residual connections and bias terms
Scalar multiplication scales every component and controls learning rates and attention weights
Vector subtraction measures differences for error computation and gradients
The L2 norm measures straight-line length; the L1 norm measures grid distance
Normalization converts a vector to unit length, preserving direction
Normalizing features is critical for gradient descent, distance metrics, and training stability

In the next lesson, you will see vectors in action through one of AI's most powerful applications: word embeddings, where every word in a language becomes a vector that captures its meaning.