Embeddings Explained Simply

In the last lesson you downloaded an "embedding model" without really knowing what it does. This lesson fixes that. Embeddings are the single idea that makes retrieval possible, and once it clicks, the rest of the course feels obvious. We will keep it intuitive and skip the heavy math.

What You'll Learn

What an embedding is, in plain language
Why embeddings let a computer find text by meaning, not just keywords
How "closeness" between embeddings powers search
A tiny illustrative example you can run in your browser

The Core Idea: Turning Meaning Into Numbers

Computers cannot compare ideas directly, but they are very good at comparing numbers. An embedding is a way of turning a piece of text into a list of numbers that captures its meaning. Text with similar meaning gets similar numbers.

That is the whole trick. The embedding model (our nomic-embed-text) reads a sentence and outputs a long list of numbers, often hundreds of them. You never read those numbers yourself; they are the computer's internal "fingerprint" for that text's meaning.

Here is the part that matters: sentences that mean similar things end up with number-fingerprints that are close together, even if they share no words at all.

"How much does it cost?" and "What is the price?" use different words but mean nearly the same thing, so their embeddings land close together.
"How much does it cost?" and "The weather is nice today" mean very different things, so their embeddings land far apart.

Why This Beats Keyword Search

Old-fashioned search matches exact words. If your notes say "automobile" but you search for "car," a keyword search finds nothing. Embeddings understand that "car" and "automobile" mean the same thing, so a search by meaning finds the right note anyway.

This is exactly what we want for a study helper. When you ask "What causes prices to rise?" your notes might phrase it as "the drivers of inflation." No shared keywords, but the meanings match, so embedding-based search finds the right passage.

Embeddings find the right passage even when the words are different.

Embeddings find the right passage even when the words are different.
Criteria	Keyword search	Embedding (meaning) search
Matches on	Exact words	Meaning
Finds synonyms?	No	Yes
"car" finds "automobile"?	No	Yes
Good for RAG?	Limited	Ideal

Keyword search

Matches on: Exact words
Finds synonyms?: No
"car" finds "automobile"?: No
Good for RAG?: Limited

Embedding (meaning) search

Matches on: Meaning
Finds synonyms?: Yes
"car" finds "automobile"?: Yes
Good for RAG?: Ideal

Closeness = Relevance

Picture every sentence as a dot in space. Sentences with similar meaning sit near each other; unrelated ones sit far apart. (In reality this "space" has hundreds of dimensions, but the intuition is the same as dots on a map.)

When you ask a question, your RAG system embeds your question into the same space and then looks for the document chunks whose dots are closest to it. Those nearest chunks are the most relevant, so they become the context handed to the model. This nearest-match step is called a similarity search, and Chroma does it for you automatically in a later lesson.

Your questionbecomes a dot
Compare distancesto every chunk
Closest chunks= most relevant

A Tiny Illustration You Can Run

We cannot run a real embedding model in the browser (it needs Ollama on your machine), but we can illustrate the idea of "similar things get similar scores." The snippet below uses a crude, made-up scoring rule, not a real embedding, just to show how a number can represent overlap in meaning so the closest match wins. Run it and read the comments.

Loading Python Playground...

A real embedding model is far smarter than this toy: it would also match "prices rise" with "inflation" even with zero shared words. But the principle you just saw, turn text into comparable numbers, then pick the closest, is exactly what powers retrieval in the rest of this course.

You Already Have What You Need

The good news: you will never write embedding math yourself. In a couple of lessons, Chroma will call nomic-embed-text through Ollama, store the resulting numbers, and run the closeness comparison for you. Your job is simply to understand why it works, which you now do.

Want the deep, optimized version of vector storage and indexing? Our Vector Databases course covers similarity search, indexing strategies, and tuning in depth. This lesson is the beginner-friendly intuition you need to keep building.

Key Takeaways

An embedding turns a piece of text into a list of numbers that captures its meaning.
Text with similar meaning gets similar numbers, so embeddings find matches by meaning, not exact words.
Search works by embedding your question and finding the document chunks whose numbers are closest; this is a similarity search.
This beats keyword search because it handles synonyms and paraphrases automatically.
You never compute embeddings by hand; the embedding model and Chroma handle it for you.

Embeddings Explained Simply

What You'll Learn

What an embedding is, in plain language
Why embeddings let a computer find text by meaning, not just keywords
How "closeness" between embeddings powers search
A tiny illustrative example you can run in your browser

The Core Idea: Turning Meaning Into Numbers

Here is the part that matters: sentences that mean similar things end up with number-fingerprints that are close together, even if they share no words at all.

"How much does it cost?" and "What is the price?" use different words but mean nearly the same thing, so their embeddings land close together.
"How much does it cost?" and "The weather is nice today" mean very different things, so their embeddings land far apart.

Why This Beats Keyword Search

Embeddings find the right passage even when the words are different.

Embeddings find the right passage even when the words are different.
Criteria	Keyword search	Embedding (meaning) search
Matches on	Exact words	Meaning
Finds synonyms?	No	Yes
"car" finds "automobile"?	No	Yes
Good for RAG?	Limited	Ideal

Keyword search

Matches on: Exact words
Finds synonyms?: No
"car" finds "automobile"?: No
Good for RAG?: Limited

Embedding (meaning) search

Matches on: Meaning
Finds synonyms?: Yes
"car" finds "automobile"?: Yes
Good for RAG?: Ideal

Closeness = Relevance

Your questionbecomes a dot
Compare distancesto every chunk
Closest chunks= most relevant

A Tiny Illustration You Can Run

Loading Python Playground...

You Already Have What You Need

Want the deep, optimized version of vector storage and indexing? Our Vector Databases course covers similarity search, indexing strategies, and tuning in depth. This lesson is the beginner-friendly intuition you need to keep building.

Key Takeaways

An embedding turns a piece of text into a list of numbers that captures its meaning.
Text with similar meaning gets similar numbers, so embeddings find matches by meaning, not exact words.
Search works by embedding your question and finding the document chunks whose numbers are closest; this is a similarity search.
This beats keyword search because it handles synonyms and paraphrases automatically.
You never compute embeddings by hand; the embedding model and Chroma handle it for you.

Embeddings Explained Simply

What You'll Learn

The Core Idea: Turning Meaning Into Numbers

Why This Beats Keyword Search

Keyword search

Embedding (meaning) search

Closeness = Relevance

A Tiny Illustration You Can Run

You Already Have What You Need

Key Takeaways

Quiz

Questions & Answers

Embeddings Explained Simply

What You'll Learn

The Core Idea: Turning Meaning Into Numbers

Why This Beats Keyword Search

Keyword search

Embedding (meaning) search

Closeness = Relevance

A Tiny Illustration You Can Run

You Already Have What You Need

Key Takeaways

Quiz

Questions & Answers