Word Embeddings: Vectors in Action
Word embeddings are one of the most elegant applications of vectors in AI. They transform words from meaningless symbols into rich numerical representations where similar words sit close together in space. This lesson shows how the vector concepts you have learned come alive in real AI systems.
One-Hot Encoding: The Naive Approach
The simplest method assigns each word a vector with a single 1 and all other values 0.
"cat" = [1, 0, 0, 0, 0]
"dog" = [0, 1, 0, 0, 0]
"truck" = [0, 0, 0, 0, 1]
| Problem | Explanation |
|---|---|
| No meaning | "cat" and "dog" are just as different as "cat" and "truck" |
| No similarity | Every pair of words has the same distance from each other |
| Huge dimensions | 50,000 words require 50,000-dimensional vectors |
Dense Embeddings: Meaning as Numbers
A word embedding maps each word to a dense vector of typically 100-1,000 dimensions, where each dimension captures some aspect of meaning.
"cat" = [0.23, -0.41, 0.82, 0.15, ...] (300 dims)
"dog" = [0.25, -0.38, 0.79, 0.18, ...] (similar to cat!)
"truck" = [-0.71, 0.56, -0.12, 0.44, ...] (very different)
Dense embeddings are compact (300 numbers, not 50,000), meaningful (similar words have similar vectors), and learnable (trained from data, not hand-coded).
Word2Vec: Learning from Context
Word2Vec (2013) showed that vectors could capture word meaning by exploiting a simple insight: words appearing in similar contexts have similar meanings.
- "The cat sat on the mat"
- "The dog sat on the mat"
- "The kitten sat on the mat"
Word2Vec trains a neural network to predict surrounding words from a target word (or vice versa). After seeing billions of word-context pairs, the network's weight matrix becomes the embedding, where semantically similar words naturally cluster together.
The Famous "King - Man + Woman = Queen"
The most celebrated result of word embeddings is that vector arithmetic captures semantic relationships.
vector("king") - vector("man") + vector("woman") ≈ vector("queen")
The subtraction "king - man" isolates the concept of royalty. Adding "woman" to that royalty direction lands near "queen."
| Relationship | Equation | Result |
|---|---|---|
| Gender | king - man + woman | queen |
| Country-Capital | Paris - France + Italy | Rome |
| Tense | walked - walk + swim | swam |
| Comparative | bigger - big + small | smaller |
These analogies emerge automatically from training data. Nobody programmed them.
Embedding Dimensions
The number of dimensions determines how much information an embedding can encode.
| Model | Dimensions | Context |
|---|---|---|
| Word2Vec | 100-300 | Static word embeddings |
| GloVe | 50-300 | Static word embeddings |
| BERT (base) | 768 | Contextual embeddings |
| GPT-3 | 12,288 | Large language model |
| GPT-4 | ~12,288+ | Multimodal language model |
How Transformers Use Embeddings
Modern AI systems process text through a multi-step embedding pipeline:
- Tokenization: Split text into tokens --
"Understanding vectors"becomes["Under", "standing", "vectors"] - Token embedding: Each token maps to a learned vector (768+ dimensions)
- Positional encoding: A position vector is added so the model knows word order
- Transformer processing: Combined vectors flow through attention layers, accumulating context at each step
Sentence and Document Embeddings
Entire sentences and documents can be compressed into a single vector:
"The cat sat on the mat" → [0.21, -0.45, 0.33, ...]
"A kitten was lying on the rug" → [0.19, -0.42, 0.31, ...] (similar!)
"Stock prices rose sharply" → [-0.67, 0.28, 0.81, ...] (very different)
Models like Sentence-BERT process full sentences through a transformer and output a fixed-size vector, enabling comparison of entire texts.
Why This Matters: Real Applications
| Application | How Embeddings Help |
|---|---|
| Semantic search | Find documents by meaning, not keywords |
| RAG | Retrieve relevant context for LLMs by comparing query and document vectors |
| Recommendations | Find similar products by comparing item embeddings |
| Clustering | Group similar documents by clustering their vectors |
| Duplicate detection | Identify paraphrased content by vector similarity |
In a RAG system, your question becomes a vector, gets compared against a database of document vectors, and the closest matches are fed to the language model as context.
Summary
- One-hot encoding creates sparse, meaningless vectors; dense embeddings capture meaning in compact form
- Word2Vec learns embeddings by predicting words from context in large text corpora
- Vector arithmetic captures semantic relationships: "king - man + woman = queen"
- Embedding dimensions range from 100 (Word2Vec) to 12,000+ (GPT-4)
- Transformers combine token embeddings with positional embeddings before processing through attention layers
- Sentence and document embeddings compress entire texts into single vectors for comparison
- Embeddings enable semantic search, RAG, recommendations, and many other AI applications
With vectors and their operations now in your toolkit, the next module explores matrices -- the structures that transform vectors and form the core of every neural network layer.

