Vector Operations
Vectors only become useful when you can manipulate them. In this lesson, you will learn the fundamental operations that AI systems perform on vectors millions of times per second: addition, scaling, subtraction, and measuring magnitude. These operations are the building blocks of neural network training and feature engineering.
Vector Addition: Combining Information
Vector addition combines two vectors by adding their corresponding components.
a = [1, 3, 5]
b = [2, 4, 6]
a + b = [1+2, 3+4, 5+6] = [3, 7, 11]
Geometrically, this is "tip-to-tail" placement: walk along a, then along b. The result is where you finish. In AI, addition appears in residual connections (adding a layer's input to its output), bias terms, and feature fusion.
Scalar Multiplication: Scaling Vectors
A scalar is a single number. Scalar multiplication multiplies every component by that number.
v = [2, 4, 6]
3 × v = [6, 12, 18]
0.5 × v = [1, 2, 3]
-1 × v = [-2, -4, -6]
Scaling by a value greater than 1 stretches the vector. A value between 0 and 1 shrinks it. A negative value reverses its direction. In AI, the learning rate scales the gradient (weights = weights - 0.01 * gradient), and attention scores scale value vectors by importance.
Vector Subtraction: Measuring Differences
Vector subtraction subtracts corresponding components, producing a vector that points from b to a.
a = [5, 8, 3]
b = [2, 3, 1]
a - b = [3, 5, 2]
In AI, subtraction is essential for error calculation (error = predicted - actual), gradient computation, and vector analogies like "king - man + woman = queen" (explored in the next lesson).
Vector Norms: Measuring Magnitude
The norm of a vector measures its length. Two norms are especially important.
L2 Norm (Euclidean) -- straight-line distance from the origin:
v = [3, 4]
||v||₂ = √(3² + 4²) = √25 = 5
L1 Norm (Manhattan) -- sum of absolute values:
v = [3, -4]
||v||₁ = |3| + |-4| = 7
| Norm | Formula | Value for [3, -4] | Intuition |
|---|---|---|---|
| L1 | sum of absolute values | 7 | Grid-walking distance |
| L2 | square root of sum of squares | 5 | Straight-line distance |
Unit Vectors and Normalization
A unit vector has a norm of exactly 1. Normalization divides a vector by its norm, preserving direction while setting magnitude to 1.
v = [3, 4] ||v||₂ = 5
unit = v / 5 = [0.6, 0.8]
Verify: sqrt(0.6² + 0.8²) = sqrt(1.0) = 1.
Why Normalization Matters in ML
Without normalization, features on different scales cause serious problems. Consider age (0-100) vs. salary (0-1,000,000). Salary dominates every calculation simply because its numbers are larger.
| Technique | Formula | Result Range |
|---|---|---|
| Min-Max | (x - min) / (max - min) | [0, 1] |
| L2 Normalization | x / ||x||₂ | Unit vector |
| Z-Score | (x - mean) / std_dev | Centered at 0 |
Normalization helps in three key ways:
- Gradient descent converges faster because the loss surface becomes more uniform
- Distance metrics work properly when vectors are on comparable scales
- Training stays stable because large values cannot cause exploding gradients
Modern transformers include Layer Normalization as a built-in operation, normalizing vectors at every layer.
Summary
- Vector addition combines vectors component-wise and powers residual connections and bias terms
- Scalar multiplication scales every component and controls learning rates and attention weights
- Vector subtraction measures differences for error computation and gradients
- The L2 norm measures straight-line length; the L1 norm measures grid distance
- Normalization converts a vector to unit length, preserving direction
- Normalizing features is critical for gradient descent, distance metrics, and training stability
In the next lesson, you will see vectors in action through one of AI's most powerful applications: word embeddings, where every word in a language becomes a vector that captures its meaning.

