Word Embeddings: Vectors in Action

Word embeddings are one of the most elegant applications of vectors in AI. They transform words from meaningless symbols into rich numerical representations where similar words sit close together in space. This lesson shows how the vector concepts you have learned come alive in real AI systems.

One-Hot Encoding: The Naive Approach

The simplest method assigns each word a vector with a single 1 and all other values 0.

"cat"   = [1, 0, 0, 0, 0]
"dog"   = [0, 1, 0, 0, 0]
"truck" = [0, 0, 0, 0, 1]

Problem	Explanation
No meaning	"cat" and "dog" are just as different as "cat" and "truck"
No similarity	Every pair of words has the same distance from each other
Huge dimensions	50,000 words require 50,000-dimensional vectors

Dense Embeddings: Meaning as Numbers

A word embedding maps each word to a dense vector of typically 100-1,000 dimensions, where each dimension captures some aspect of meaning.

"cat"   = [0.23, -0.41, 0.82, 0.15, ...]   (300 dims)
"dog"   = [0.25, -0.38, 0.79, 0.18, ...]   (similar to cat!)
"truck" = [-0.71, 0.56, -0.12, 0.44, ...]  (very different)

Dense embeddings are compact (300 numbers, not 50,000), meaningful (similar words have similar vectors), and learnable (trained from data, not hand-coded).

Word2Vec: Learning from Context

Word2Vec (2013) showed that vectors could capture word meaning by exploiting a simple insight: words appearing in similar contexts have similar meanings.

"The cat sat on the mat"
"The dog sat on the mat"
"The kitten sat on the mat"

Word2Vec trains a neural network to predict surrounding words from a target word (or vice versa). After seeing billions of word-context pairs, the network's weight matrix becomes the embedding, where semantically similar words naturally cluster together.

The Famous "King - Man + Woman = Queen"

The most celebrated result of word embeddings is that vector arithmetic captures semantic relationships.

vector("king") - vector("man") + vector("woman") ≈ vector("queen")

The subtraction "king - man" isolates the concept of royalty. Adding "woman" to that royalty direction lands near "queen."

Relationship	Equation	Result
Gender	king - man + woman	queen
Country-Capital	Paris - France + Italy	Rome
Tense	walked - walk + swim	swam
Comparative	bigger - big + small	smaller

These analogies emerge automatically from training data. Nobody programmed them.

Embedding Dimensions

The number of dimensions determines how much information an embedding can encode.

Model	Dimensions	Context
Word2Vec	100-300	Static word embeddings
GloVe	50-300	Static word embeddings
BERT (base)	768	Contextual embeddings
GPT-3	12,288	Large language model
GPT-4	~12,288+	Multimodal language model

How Transformers Use Embeddings

Modern AI systems process text through a multi-step embedding pipeline:

Tokenization: Split text into tokens -- "Understanding vectors" becomes ["Under", "standing", "vectors"]
Token embedding: Each token maps to a learned vector (768+ dimensions)
Positional encoding: A position vector is added so the model knows word order
Transformer processing: Combined vectors flow through attention layers, accumulating context at each step

Sentence and Document Embeddings

Entire sentences and documents can be compressed into a single vector:

"The cat sat on the mat"        → [0.21, -0.45, 0.33, ...]
"A kitten was lying on the rug" → [0.19, -0.42, 0.31, ...]  (similar!)
"Stock prices rose sharply"     → [-0.67, 0.28, 0.81, ...]  (very different)

Models like Sentence-BERT process full sentences through a transformer and output a fixed-size vector, enabling comparison of entire texts.

Why This Matters: Real Applications

Application	How Embeddings Help
Semantic search	Find documents by meaning, not keywords
RAG	Retrieve relevant context for LLMs by comparing query and document vectors
Recommendations	Find similar products by comparing item embeddings
Clustering	Group similar documents by clustering their vectors
Duplicate detection	Identify paraphrased content by vector similarity

In a RAG system, your question becomes a vector, gets compared against a database of document vectors, and the closest matches are fed to the language model as context.

Summary

One-hot encoding creates sparse, meaningless vectors; dense embeddings capture meaning in compact form
Word2Vec learns embeddings by predicting words from context in large text corpora
Vector arithmetic captures semantic relationships: "king - man + woman = queen"
Embedding dimensions range from 100 (Word2Vec) to 12,000+ (GPT-4)
Transformers combine token embeddings with positional embeddings before processing through attention layers
Sentence and document embeddings compress entire texts into single vectors for comparison
Embeddings enable semantic search, RAG, recommendations, and many other AI applications

With vectors and their operations now in your toolkit, the next module explores matrices -- the structures that transform vectors and form the core of every neural network layer.