Mini-Project, Troubleshooting, and Next Steps
You now have every piece of a working local RAG system. This final lesson ties it into one small project you complete yourself, gives you a troubleshooting checklist for the bumps beginners commonly hit, and points you toward where to go when you are ready for more.
What You'll Learn
- A step-by-step mini-project: a private study helper over your own PDF
- A troubleshooting checklist for the most common problems
- Simple ways to make your answers better
- When and where to scale beyond local
Your Mini-Project: A Private Study Helper
Let's put it all together with your own material. Follow these steps and you will have a personal assistant that answers from a document you choose.
- Pick a document. Choose a PDF or text file you actually want to study: lecture notes, a chapter, an article. Make sure it has selectable text (not a scanned image). Put it in your project folder.
- Build the knowledge base. Point
build.pyat your file (change the filename inload_pdf("...")) and runpython build.py. Confirm it prints the number of chunks stored. - Ask questions. Run the interactive version of
ask.pyand ask real questions about the material, the kind you might be tested on. - Check the answers. Open your document and verify the answers actually match what it says. This is how you learn to trust (and improve) the system.
That is a complete, private, offline study helper that you built. Everything stays on your machine, costs nothing to run, and works without internet.
Troubleshooting Checklist
Most beginner problems come from one of a few causes. Work down this list.
Decision
What is going wrong?
- If Code can't connect / connection refused
Ollama isn't running. Open Ollama or run 'ollama serve', then retry.
The code talks to localhost:11434, which only works while Ollama is active.
- If Error: model not found
Pull the model: 'ollama pull llama3.2' and 'ollama pull nomic-embed-text'.
Both models must be downloaded before the code can use them.
- If Answers are empty or 'I don't know'
Your chunks may not contain the answer. Check the PDF has real text and try more chunks (n_results=5).
Scanned-image PDFs have no extractable text.
- If Answers ignore your documents
Strengthen the prompt instruction to 'use ONLY the context', and confirm retrieval returned relevant chunks.
Print the retrieved chunks to see what the model actually received.
A quick habit that solves most mysteries: print the retrieved chunks before generating. Add print(results["documents"][0]) after the query. If the right text is not in there, the problem is retrieval (chunking or the document), not the model.
Making Your Answers Better
Once the basics work, a few small tweaks noticeably improve quality:
- Retrieve more or fewer chunks. If answers miss details, raise
n_resultsto 4 or 5. If they ramble or wander off topic, lower it. - Adjust chunk size. For dense, technical material, smaller chunks (around 500 characters) can sharpen retrieval. For flowing prose, larger chunks keep more context.
- Add more documents. You can run
build.pyon several files into the same collection (just give each chunk a unique id, for example by prefixing the filename). Your helper then answers across your whole library. - Sharpen the instruction. Asking the model to answer "in three bullet points" or "in one sentence" shapes the output to what you need.
These are all small, safe experiments. Change one thing, re-run, and see if answers improve.
When to Scale Beyond Local
Local RAG is perfect for a private helper that only you use. There are a few signs you have outgrown it.
Decision
Do you need to go beyond local?
- If Just you, private notes, learning
Stay local. You already have the right setup.
- If You want others to use it on the web
Move to a hosted, full-stack RAG app.
See the Full-Stack RAG course below.
- If Millions of documents, need speed and tuning
Go deeper on vector database indexing and scaling.
See the Vector Databases course below.
Here is where to go next, depending on your goal:
- Build a real web app others can use. Our Full-Stack RAG with Next.js, Supabase & Gemini course takes the same ideas and deploys them as a live website with a cloud model and a hosted database. It is the natural step up from this course.
- Master the storage layer. Our Vector Databases course goes deep on similarity search, indexing strategies, hybrid search, and performance tuning, the advanced version of the storage you just used.
- Run a private local agent. If you enjoyed running models locally, the Hermes Agent micro course shows how to self-host a private AI assistant that can take actions, not just answer questions.
- Solidify the concept. Revisit What Is RAG (Retrieval-Augmented Generation) now that you have built one; it will read very differently.
What You Accomplished
Take a moment to appreciate what you built. You installed a local AI model, learned what embeddings are, loaded and chunked your own documents, embedded and stored them in a local database, and wrote a query loop that answers questions from your private material. The entire system runs on your machine: private, free, and offline. That is a genuine, useful skill, and it is the same architecture that powers professional AI knowledge tools, just at a personal scale.
Key Takeaways
- The mini-project is a private study helper: build a knowledge base from your own PDF, then ask it questions and verify the answers.
- Most problems trace to Ollama not running, a model not pulled, or a document with no extractable text; printing the retrieved chunks reveals retrieval issues fast.
- Improve answers by tuning
n_results, adjusting chunk size, adding more documents, and sharpening the prompt instruction. - Stay local for private personal use; scale to a full-stack cloud app to share it, or go deeper on vector databases for large-scale tuning.
- You built a complete, private, offline RAG system, the same architecture behind professional knowledge tools.

