Run a Local AI Model with Ollama

Before we can ask questions about your documents, we need an AI model running on your own machine. The easiest, most beginner-friendly way to do that is a free tool called Ollama. In this lesson you will install it, download a small model, and have your first conversation with an AI that runs entirely offline.

What You'll Learn

What Ollama is and why it makes local AI simple
How to install Ollama on your computer
How to download (pull) a model and chat with it
How to download an embedding model you will need later
How to check that Ollama is running in the background

What Is Ollama?

Ollama is a free, open-source program that downloads AI models and runs them on your computer. Think of it as a simple manager for local models: you tell it which model you want, it downloads the file, and then you can chat with that model from your terminal or from your own code. There is no account to create and nothing is sent to the cloud.

Once Ollama is running, it quietly listens in the background at the address http://localhost:11434. That address only exists on your machine. Later in the course, our Python code will send questions to that local address instead of to any external service. That is the heart of what makes this "local."

Step 1: Install Ollama

Go to ollama.com and download the installer for your operating system (Windows, macOS, or Linux). Run it like any normal application. The installer sets everything up for you.

To confirm it worked, open your terminal (Terminal on macOS/Linux, or PowerShell/Command Prompt on Windows) and type:

ollama --version

If you see a version number printed back, Ollama is installed and ready.

Step 2: Pull and Run Your First Model

A "model" is the actual AI brain. We will use Llama 3.2, a small, capable model from Meta that runs comfortably on a normal laptop. The default version is about 3 billion parameters and downloads as roughly a 2 GB file, so the first download may take a few minutes.

Run this single command:

ollama run llama3.2

The first time, Ollama downloads the model. After that, it opens an interactive chat prompt right in your terminal. Try typing a question:

>>> Explain photosynthesis in two sentences.

The model replies, generated entirely on your computer. To leave the chat, type /bye and press Enter.

That is a complete, private, offline AI assistant already. The rest of this course teaches it to answer using your documents.

Tip: If your laptop is older or has limited memory, you can pull an even smaller version with ollama pull llama3.2:1b. It is faster and lighter, with slightly simpler answers. The 3B default is a good balance for most machines.

Step 3: Pull the Embedding Model

Generating answers is only half of RAG. To find the right pieces of your documents, we need a second, special kind of model called an embedding model. We cover what embeddings are in the next lesson; for now, just download the one we will use:

ollama pull nomic-embed-text

This is a small download. nomic-embed-text is built specifically to turn text into the numbers that make search possible. We will not chat with it; our code will call it behind the scenes.

After this command, you have both models you need for the whole course:

llama3.2 to generate answers
nomic-embed-text to embed (search) your documents

How the Pieces Fit Together

Here is the role each Ollama model plays in the system you are building.

Two models, two jobs: one finds the right context, the other writes the answer.

Two models, two jobs: one finds the right context, the other writes the answer.
Criteria	llama3.2	nomic-embed-text
Job	Writes the final answer	Turns text into searchable numbers
Used when	You ask a question	Storing docs and matching your query
You chat with it?	Yes, directly	No, the code calls it
Pulled with	ollama run llama3.2	ollama pull nomic-embed-text

llama3.2

Job: Writes the final answer
Used when: You ask a question
You chat with it?: Yes, directly
Pulled with: ollama run llama3.2

nomic-embed-text

Job: Turns text into searchable numbers
Used when: Storing docs and matching your query
You chat with it?: No, the code calls it
Pulled with: ollama pull nomic-embed-text

Step 4: Confirm Ollama Is Listening

While Ollama is installed, it runs a small background service. Our Python code later will talk to it at localhost:11434. You can confirm it is alive with this command:

ollama list

This prints the models you have downloaded. If you see llama3.2 and nomic-embed-text in the list, you are fully set up. If the command works, the background service is running and ready for our code to connect.

If you ever restart your computer and the code cannot reach the model, just make sure Ollama is open. On most systems it starts automatically; you can also run ollama serve to start it manually.

Why This Matters for Privacy

Every model you just downloaded now lives as a file on your disk. When you ask a question, the text goes from your code to localhost (your own machine) and back. It never travels over the internet. That single fact is what gives local RAG its privacy guarantee, and it is why this approach is so appealing for sensitive documents.

Key Takeaways

Ollama is a free tool that downloads and runs AI models locally; install it from ollama.com.
ollama run llama3.2 downloads and chats with a small, capable model. Type /bye to exit.
ollama pull nomic-embed-text downloads the embedding model used to search your documents.
Ollama listens at http://localhost:11434, an address that only exists on your machine, so nothing is uploaded.
Use ollama list to confirm both models are downloaded and the service is running.

Run a Local AI Model with Ollama

What You'll Learn

What Ollama is and why it makes local AI simple
How to install Ollama on your computer
How to download (pull) a model and chat with it
How to download an embedding model you will need later
How to check that Ollama is running in the background

What Is Ollama?

Step 1: Install Ollama

Go to ollama.com and download the installer for your operating system (Windows, macOS, or Linux). Run it like any normal application. The installer sets everything up for you.

To confirm it worked, open your terminal (Terminal on macOS/Linux, or PowerShell/Command Prompt on Windows) and type:

ollama --version

If you see a version number printed back, Ollama is installed and ready.

Step 2: Pull and Run Your First Model

Run this single command:

ollama run llama3.2

The first time, Ollama downloads the model. After that, it opens an interactive chat prompt right in your terminal. Try typing a question:

>>> Explain photosynthesis in two sentences.

The model replies, generated entirely on your computer. To leave the chat, type /bye and press Enter.

That is a complete, private, offline AI assistant already. The rest of this course teaches it to answer using your documents.

Tip: If your laptop is older or has limited memory, you can pull an even smaller version with ollama pull llama3.2:1b. It is faster and lighter, with slightly simpler answers. The 3B default is a good balance for most machines.

Step 3: Pull the Embedding Model

ollama pull nomic-embed-text

This is a small download. nomic-embed-text is built specifically to turn text into the numbers that make search possible. We will not chat with it; our code will call it behind the scenes.

After this command, you have both models you need for the whole course:

llama3.2 to generate answers
nomic-embed-text to embed (search) your documents

How the Pieces Fit Together

Here is the role each Ollama model plays in the system you are building.

Two models, two jobs: one finds the right context, the other writes the answer.

Two models, two jobs: one finds the right context, the other writes the answer.
Criteria	llama3.2	nomic-embed-text
Job	Writes the final answer	Turns text into searchable numbers
Used when	You ask a question	Storing docs and matching your query
You chat with it?	Yes, directly	No, the code calls it
Pulled with	ollama run llama3.2	ollama pull nomic-embed-text

llama3.2

Job: Writes the final answer
Used when: You ask a question
You chat with it?: Yes, directly
Pulled with: ollama run llama3.2

nomic-embed-text

Job: Turns text into searchable numbers
Used when: Storing docs and matching your query
You chat with it?: No, the code calls it
Pulled with: ollama pull nomic-embed-text

Step 4: Confirm Ollama Is Listening

While Ollama is installed, it runs a small background service. Our Python code later will talk to it at localhost:11434. You can confirm it is alive with this command:

ollama list

If you ever restart your computer and the code cannot reach the model, just make sure Ollama is open. On most systems it starts automatically; you can also run ollama serve to start it manually.

Why This Matters for Privacy

Key Takeaways

Ollama is a free tool that downloads and runs AI models locally; install it from ollama.com.
ollama run llama3.2 downloads and chats with a small, capable model. Type /bye to exit.
ollama pull nomic-embed-text downloads the embedding model used to search your documents.
Ollama listens at http://localhost:11434, an address that only exists on your machine, so nothing is uploaded.
Use ollama list to confirm both models are downloaded and the service is running.

Run a Local AI Model with Ollama

What You'll Learn

What Is Ollama?

Step 1: Install Ollama

Step 2: Pull and Run Your First Model

Step 3: Pull the Embedding Model

How the Pieces Fit Together

llama3.2

nomic-embed-text

Step 4: Confirm Ollama Is Listening

Why This Matters for Privacy

Key Takeaways

Quiz

Questions & Answers

Run a Local AI Model with Ollama

What You'll Learn

What Is Ollama?

Step 1: Install Ollama

Step 2: Pull and Run Your First Model

Step 3: Pull the Embedding Model

How the Pieces Fit Together

llama3.2

nomic-embed-text

Step 4: Confirm Ollama Is Listening

Why This Matters for Privacy

Key Takeaways

Quiz

Questions & Answers