Cosine Similarity
In the previous lesson, we learned that the dot product measures alignment between vectors. But there is a problem: the dot product is affected by the magnitude of the vectors, not just their direction. Two very long vectors pointing in the same direction will have a huge dot product, while two short vectors pointing in the same direction will have a small one. In AI, we often care about direction only. Cosine similarity solves this problem, and it is the standard way AI systems compare embeddings, documents, and data points.
The Problem with Raw Dot Products
Consider two document vectors representing word frequencies:
Short article: a = [1, 2, 1]
Long article: b = [10, 20, 10]
These vectors point in exactly the same direction (b is just 10 times longer), meaning both documents use words in the same proportions. But their dot product is:
a . b = (1*10) + (2*20) + (1*10) = 60
Compare that to two short documents with different content:
a = [1, 2, 1]
c = [2, 1, 2]
a . c = (1*2) + (2*1) + (1*2) = 6
The dot product with b is 10 times larger, but only because b is longer, not because it is more similar in meaning. We need a metric that ignores magnitude.
Cosine Similarity Defined
Cosine similarity normalizes the dot product by dividing by the magnitudes of both vectors:
cosine_similarity(a, b) = (a . b) / (|a| * |b|)
Where |a| = sqrt(a1^2 + a2^2 + ... + an^2) is the magnitude (length) of vector a.
This is equivalent to computing the cosine of the angle between the two vectors. The result tells you purely about direction, completely removing the effect of magnitude.
The Range of Cosine Similarity
Because we are computing a cosine, the result always falls in a predictable range:
| Value | Meaning | Angle Between Vectors |
|---|---|---|
| 1 | Identical direction | 0 degrees |
| 0 | Completely unrelated | 90 degrees |
| -1 | Opposite direction | 180 degrees |
For most AI applications using non-negative values (like word counts or embedding dimensions), cosine similarity falls between 0 and 1.
Worked Example: Comparing Document Vectors
Suppose we represent three documents by how often they mention key topics: [math, science, sports].
Doc A (math textbook): [8, 6, 0]
Doc B (science journal): [3, 9, 1]
Doc C (sports article): [0, 1, 7]
Similarity between Doc A and Doc B:
a . b = (8*3) + (6*9) + (0*1) = 24 + 54 + 0 = 78
|a| = sqrt(64 + 36 + 0) = sqrt(100) = 10
|b| = sqrt(9 + 81 + 1) = sqrt(91) = 9.54
cosine_similarity = 78 / (10 * 9.54) = 78 / 95.4 = 0.817
Similarity between Doc A and Doc C:
a . c = (8*0) + (6*1) + (0*7) = 0 + 6 + 0 = 6
|a| = 10
|c| = sqrt(0 + 1 + 49) = sqrt(50) = 7.07
cosine_similarity = 6 / (10 * 7.07) = 6 / 70.7 = 0.085
Doc A is very similar to Doc B (0.817) but nearly unrelated to Doc C (0.085). This matches our intuition: a math textbook is related to a science journal but not to a sports article.
Why Normalization Matters
Dividing by the magnitudes is a form of normalization. It puts all vectors on equal footing regardless of their scale. This is critical in AI because:
- Document vectors vary in length based on document size, not content
- Embedding vectors can have different magnitudes depending on the model
- Feature vectors may have wildly different scales across dimensions
By normalizing, cosine similarity answers the question: "Do these vectors point in the same direction?" rather than "Are these vectors the same size?"
Cosine Distance
While cosine similarity measures how alike two vectors are, cosine distance measures how different they are:
cosine_distance = 1 - cosine_similarity
| Cosine Similarity | Cosine Distance | Meaning |
|---|---|---|
| 1.0 | 0.0 | Identical |
| 0.8 | 0.2 | Very similar |
| 0.5 | 0.5 | Somewhat related |
| 0.0 | 1.0 | Unrelated |
Many vector databases and search systems use cosine distance because they are optimized to find the nearest (smallest distance) neighbors.
Cosine Similarity vs Euclidean Distance
Two common ways to compare vectors are cosine similarity and Euclidean distance. They answer different questions:
| Property | Cosine Similarity | Euclidean Distance |
|---|---|---|
| Measures | Direction (angle) | Absolute position (straight-line gap) |
| Affected by magnitude | No | Yes |
| Range | -1 to 1 | 0 to infinity |
| Best when | Comparing meaning or content | Comparing absolute quantities |
| Used for | Embeddings, NLP, recommendations | Clustering, image pixels, physical data |
Rule of thumb: Use cosine similarity when you care about what kind of thing something is. Use Euclidean distance when you care about how much of something there is.
AI Context: The Standard for Comparing Embeddings
Cosine similarity is the default metric for comparing embeddings across nearly all of modern AI:
- Word embeddings: Word2Vec and GloVe use cosine similarity to measure word relatedness
- Sentence embeddings: Models like BERT produce vectors where cosine similarity captures semantic meaning
- Search engines: Queries and documents are embedded as vectors, then ranked by cosine similarity
- Recommendation systems: User preferences and item features are compared via cosine similarity
- LLM retrieval: When an LLM needs to find relevant context, it uses cosine similarity on vector embeddings
When someone says "these two texts are 92% similar," they almost always mean the cosine similarity of their embedding vectors is 0.92.
Summary
- The raw dot product is affected by vector magnitude, making it unreliable for comparing direction alone
- Cosine similarity = (a . b) / (|a| * |b|) removes the effect of magnitude
- The result ranges from -1 (opposite) to 0 (unrelated) to 1 (identical direction)
- Cosine distance = 1 - cosine similarity, used when you need a distance metric
- Choose cosine similarity for comparing meaning and Euclidean distance for comparing absolute quantities
- Cosine similarity is the standard metric for comparing embeddings in NLP, search, and recommendations
With cosine similarity in our toolkit, we can now measure how similar any two pieces of data are once they have been converted to vectors. In the next lesson, we will see how this simple operation powers real AI applications like semantic search, recommendation systems, and retrieval-augmented generation.

