Similarity in AI Applications
We now have two powerful tools: the dot product and cosine similarity. These are not just mathematical abstractions. They are the engine behind some of the most impactful applications in modern AI. Every time you use a search engine, get a recommendation, or ask an LLM a question backed by your own documents, vector similarity is doing the heavy lifting behind the scenes.
The Similarity Pipeline
Almost every AI application that uses similarity follows the same four-step pipeline:
1. Embed -> Convert raw data (text, images, audio) into vectors
2. Store -> Save those vectors in a database
3. Query -> Convert the user's query into a vector
4. Rank -> Find stored vectors most similar to the query vector
This pattern is so common that entire companies and open-source projects exist to optimize each step. Let us now look at the specific applications built on this pipeline.
Semantic Search
Traditional keyword search looks for exact word matches. Semantic search finds results based on meaning, even when no words overlap.
How it works:
- Each document is converted into an embedding vector using a model like OpenAI's text-embedding or Sentence-BERT
- These vectors are stored in a database
- When a user searches for "how to fix a flat tire," the query is also embedded
- Cosine similarity is computed between the query vector and every document vector
- The most similar documents are returned as results
The key insight is that "how to fix a flat tire" and "changing a punctured wheel" have very different words but similar embedding vectors. Cosine similarity captures this semantic relationship where keyword matching would fail entirely.
Retrieval-Augmented Generation (RAG)
RAG is one of the most important patterns in modern AI. It lets large language models answer questions using specific, up-to-date documents rather than relying solely on their training data.
The RAG process:
User question -> Embed question -> Find similar document chunks
|
v
LLM generates answer <- Feed relevant chunks + question to LLM
- A knowledge base (company docs, research papers, manuals) is split into chunks and embedded
- When a user asks a question, the question is embedded into a vector
- Cosine similarity finds the most relevant chunks from the knowledge base
- Those chunks are provided to the LLM as context alongside the question
- The LLM generates an answer grounded in the retrieved information
Without vector similarity, the LLM would have no way to quickly search through thousands of documents to find the relevant passages. The entire RAG pipeline depends on the cosine similarity operation we learned in the previous lesson.
Recommendation Systems
When Netflix suggests a show or Spotify recommends a song, vector similarity is often at work.
Content-based recommendations represent items as vectors of features:
Movie A (action, sci-fi): [0.9, 0.1, 0.8, 0.0, 0.2]
Movie B (action, thriller): [0.8, 0.0, 0.3, 0.7, 0.1]
Movie C (romance, comedy): [0.1, 0.9, 0.0, 0.0, 0.8]
If a user liked Movie A, the system computes cosine similarity between A and all other movies. Movie B (similar action profile) scores higher than Movie C (different genre entirely).
Collaborative filtering represents users as vectors of preferences. Users with similar preference vectors get recommended each other's highly-rated items. The phrase "users who liked X also liked Y" is, at its core, a vector similarity computation.
Image Similarity
Images can also be converted to vectors using models like CLIP or ResNet. Once embedded, the same similarity techniques apply.
Applications include:
- Reverse image search: Upload a photo, find visually similar images
- Duplicate detection: Find near-identical images in large databases
- Visual recommendations: "Show me products that look like this"
- Content moderation: Detect images similar to known harmful content
An image embedding might have hundreds or thousands of dimensions, but the comparison still comes down to computing cosine similarity between two vectors.
Clustering
Clustering groups similar vectors together without predefined labels. The k-means algorithm, one of the most widely used clustering methods, works directly with distances between vectors.
How k-means works:
- Choose k starting center points (centroids)
- Assign each data vector to its nearest centroid (using Euclidean distance or cosine distance)
- Recalculate centroids as the average of their assigned vectors
- Repeat steps 2-3 until assignments stabilize
Applications of clustering in AI:
- Topic modeling: Grouping documents by subject matter
- Customer segmentation: Identifying distinct customer types
- Image organization: Automatically sorting photos into categories
- Data exploration: Discovering natural groupings in unfamiliar data
Anomaly Detection
If you know what "normal" looks like as a cluster of vectors, anything far from that cluster is potentially anomalous.
Normal transactions: clustered tightly, high mutual similarity
Fraudulent transaction: vector far from the normal cluster, low similarity
Applications include:
- Fraud detection: Transactions with unusual feature vectors
- Network security: Network traffic patterns that differ from baselines
- Manufacturing: Sensor readings that deviate from normal operating conditions
- Health monitoring: Patient metrics that diverge from healthy patterns
Anomaly detection is essentially similarity in reverse: instead of finding the most similar items, you find the least similar ones.
Vector Databases
When you have millions or billions of vectors to search through, you need specialized infrastructure. Vector databases are purpose-built to store, index, and search high-dimensional vectors efficiently.
| Vector Database | Key Feature |
|---|---|
| Pinecone | Fully managed, serverless scaling |
| Weaviate | Open-source, hybrid search |
| Milvus | Open-source, high performance |
| Qdrant | Open-source, Rust-based speed |
| Chroma | Lightweight, developer-friendly |
| pgvector | PostgreSQL extension, familiar SQL interface |
These databases use approximate nearest neighbor (ANN) algorithms that trade a small amount of accuracy for massive speed improvements. Instead of comparing a query against every single stored vector, ANN algorithms use clever indexing structures to narrow down candidates quickly.
Without vector databases, computing cosine similarity against millions of vectors for every query would be far too slow for real-time applications.
Putting It All Together
Every application in this lesson follows the same fundamental pattern built on the linear algebra we have learned:
| Step | Linear Algebra Concept | What Happens |
|---|---|---|
| Embed | Vectors (Module 1) | Raw data becomes a point in high-dimensional space |
| Store | Vector spaces | Millions of vectors are indexed for fast retrieval |
| Query | Dot product and cosine similarity (this module) | A query vector is compared against stored vectors |
| Rank | Sorting by similarity score | The most similar results are returned to the user |
The math is simple: multiply, sum, normalize, compare. But at the scale of modern AI, this simple math powers search engines, chatbots, recommendation systems, fraud detection, and much more.
Summary
- Semantic search uses embedding vectors and cosine similarity to find results by meaning, not keywords
- RAG retrieves relevant document chunks via vector similarity to give LLMs accurate, grounded context
- Recommendation systems compare user and item vectors to suggest "users who liked X also liked Y"
- Image similarity embeds images as vectors and compares them the same way as text
- Clustering groups similar vectors together using distance metrics like k-means
- Anomaly detection identifies vectors that are far from normal clusters
- Vector databases store and search millions of vectors efficiently using approximate nearest neighbor algorithms
- The core pipeline is always: embed, store, query, rank
With Module 4 complete, you now understand how AI measures similarity between any two pieces of data. In Module 5, we will explore eigenvalues and eigenvectors, which reveal the hidden structure within data and power techniques like dimensionality reduction and principal component analysis.

