AI Reads Your Docs: Build a Local RAG System with Ollama

Your New Intern Knows Nothing

Remember that private AI intern we hired in the last lesson? The one that lives on your computer and analyzes data without talking to the internet? It’s great. It’s secure. But it has a problem.

It’s like a brilliant, freshly-graduated intern on their first day. They’re smart, they can reason, but they know absolutely *nothing* about your business. Ask them about your company’s onboarding policy, and they’ll give you a generic, philosophical answer about the importance of onboarding. Ask them what was decided in last quarter’s strategy meeting, and you’ll get a blank stare.

Your intern can think, but it can’t read. It has no access to your company’s knowledge base—the shared drive, the wikis, the meeting notes. It’s a brain without a library.

Today, we’re fixing that. We’re building the library. We’re going to teach your local AI to read your private documents so it can answer specific questions about your business, instantly.

Why This Matters

This is one of the most powerful automations you can build. It solves the universal business problem of “Where did I see that information?”

Time: The average employee spends nearly 20% of their workweek looking for internal information. This system reduces that to seconds.
Knowledge Management: It turns your chaotic mess of documents (we all have one) into a single, searchable brain. Company knowledge no longer leaves when an employee does.
Consistency: Get the same, accurate answer every time, based directly on your source material. No more conflicting advice from different colleagues.
Replaces: This replaces the endless searching in Google Drive, the “quick question” interruptions on Slack, and the painful process of manually training new hires on company procedures.

You are building an omniscient, private company expert who has read everything and forgets nothing.

What This Tool / Workflow Actually Is

This system is called RAG, which stands for Retrieval-Augmented Generation. It sounds complicated, but the concept is beautifully simple. It’s a two-step dance:

Retrieval (The Librarian): When you ask a question, the system first *retrieves* the most relevant snippets of text from your document library. It doesn’t scan every word in every document. It uses a clever indexing method (called a vector store) to find the right information instantly, like a librarian who knows exactly which page of which book has your answer.
Generation (The Expert): The system then takes your question, bundles it with the relevant snippets it just found, and hands it all to the AI. It says, “Hey, using ONLY this information I’ve provided, answer this user’s question.”

That’s it. The AI isn’t answering from its generic, pre-trained knowledge. It’s answering from the custom, just-in-time cheat sheet you gave it. This is what keeps it accurate and grounded in your reality.

Prerequisites

This lesson builds directly on the last one. Don’t skip ahead!

Our Previous Setup: You must have Ollama installed with a model running, like llama3. You also need Python installed. If you haven’t done this, go back to the previous lesson now.
New Python Libraries: We need some new tools for our librarian. We’ll use the popular LangChain framework to orchestrate the process. Open your terminal and run this command:
```
pip install langchain langchain_community faiss-cpu pypdf
```
(faiss-cpu is our super-fast indexer, and pypdf lets us read PDFs).
A Folder of Documents: Create a folder named private_docs. This is where you’ll put the PDFs, TXTs, or other files you want the AI to read. For this tutorial, we’ll create them ourselves.

Step-by-Step Tutorial

Let’s build your AI’s personal library. We’ll break this into two scripts: one to process and ‘learn’ your documents (ingestion), and another to ask questions (querying).

Step 1: Create Your Project and Sample Documents

Create a main folder for this project. Inside it, create the private_docs subfolder. Now, let’s create two sample text files inside private_docs.

File 1: company_policy.txt

Company Onboarding Policy

All new employees must complete their HR paperwork within 3 days of their start date. The official work-from-home policy allows for remote work on Mondays and Fridays. All expense reports must be submitted via the 'Expensify' portal by the 25th of each month. The company's official messaging platform is Slack.

File 2: project_alpha_notes.txt

Meeting Notes: Project Alpha Kick-off

Date: January 15th
Attendees: Sarah, David, Maria

Key Decisions:
- The project deadline is set for Q3.
- The primary technology stack will be Python with a React frontend.
- Maria is assigned as the Project Lead. David will handle the backend development.

Step 2: The Ingestion Script (Building the Library Index)

This script reads the documents, splits them into bite-sized chunks, and creates a searchable vector store—our librarian’s brain. Create a file named ingest.py.

from langchain_community.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import FAISS

# 1. Define the path to your documents and the vector store
DATA_PATH = "private_docs"
DB_PATH = "faiss_index"

# 2. Function to create the vector store
def create_vector_store():
    print("Loading documents...")
    loader = DirectoryLoader(DATA_PATH, glob='*.txt') # Using .txt for this example
    documents = loader.load()
    
    print("Splitting documents into chunks...")
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
    chunks = text_splitter.split_documents(documents)
    
    print("Creating embeddings and building the vector store...")
    # This uses the default Ollama model you have running
    embeddings = OllamaEmbeddings(model="llama3")
    
    # This creates the FAISS index from the chunks and embeddings
    db = FAISS.from_documents(chunks, embeddings)
    
    # Save the vector store locally
    db.save_local(DB_PATH)
    print(f"Vector store created and saved to {DB_PATH}")

# 3. Main execution block
if __name__ == "__main__":
    create_vector_store()

Now, run this from your terminal:

python ingest.py

It will take a moment, and you’ll see a new folder named faiss_index appear. That’s it! You’ve indexed your knowledge base. You only need to run this script again when you add or change your documents.

Step 3: The Query Script (Asking Questions)

This is the script you’ll run anytime you want to ask a question. It loads the index, finds the relevant info, and gets an answer from your local AI. Create a file named query.py.

import ollama
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import FAISS

# 1. Define paths and the model
DB_PATH = "faiss_index"
MODEL = "llama3"

# 2. Create the prompt template
# This tells the AI how to behave and formats the question and context.
prompt_template = """
Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}
Helpful Answer:
"""

# 3. The main query function
def query_rag(question: str):
    print("Loading the vector store...")
    embeddings = OllamaEmbeddings(model=MODEL)
    db = FAISS.load_local(DB_PATH, embeddings, allow_dangerous_deserialization=True) # Security flag for local loading

    print("Performing similarity search...")
    # Get the most relevant documents (we'll ask for the top 3)
    results = db.similarity_search(question, k=3)
    context = "\
".join([doc.page_content for doc in results])

    print("Formatting the prompt...")
    formatted_prompt = prompt_template.format(context=context, question=question)

    print("--- Sending prompt to the AI ---")
    response = ollama.chat(
        model=MODEL,
        messages=[
            {
                'role': 'user',
                'content': formatted_prompt,
            },
        ]
    )
    return response['message']['content']

# 4. Main execution block
if __name__ == "__main__":
    # Example question
    # Try changing this to other questions!
    question = "Who is the project lead for Project Alpha?"
    
    print(f"--- QUERY: {question} ---")
    answer = query_rag(question)
    print("--- ANSWER ---")
    print(answer)

Complete Automation Example: Running Your Q&A Bot

With both scripts and your documents in place, let’s test it.

Make sure you’ve run ingest.py at least once to create the index.
Run the query script from your terminal:
```
python query.py
```

The output should be something clear and accurate like:

Maria is assigned as the Project Lead for Project Alpha.

Now, go into query.py and change the question. Try these:

“What is the policy for working from home?”
“What is the deadline for Project Alpha?”
“When are expense reports due?”
“What’s the capital of France?” (It should say it doesn’t know, because that’s not in our documents!)

You now have a fully functional, private, document-aware question-and-answer system.

Real Business Use Cases

This RAG pattern is a business superpower. You can apply it everywhere:

Customer Support: Ingest your entire knowledge base of product manuals. The RAG bot can power a chatbot on your website to answer customer questions with 100% accuracy based on your official docs.
Sales Enablement: Feed it all your case studies, product one-pagers, and competitor analyses. A salesperson can ask, “Give me three case studies for a client in the healthcare industry” and get instant results.
HR & Onboarding: Ingest the entire employee handbook, benefits documents, and company policies. New hires can ask “How do I request vacation time?” and get the correct answer without bothering HR.
Legal & Compliance: Create a secure RAG system for a legal team to query tens of thousands of contracts. They can ask, “Which of our contracts have a ‘force majeure’ clause?”
Software Development: Ingest your entire codebase. A developer can ask, “Where in the code is the user authentication logic handled?” to quickly find relevant files.

Common Mistakes & Gotchas

Forgetting to Re-Ingest: If you add a new document to your private_docs folder, the system won’t know about it until you run ingest.py again.
Poor Document Quality (GIGO): Garbage In, Garbage Out. If your source documents are poorly written, out-of-date, or contradictory, your AI’s answers will be too. This system forces good knowledge hygiene.
Sensitive Data in Chunks: Be aware that the text is split into chunks. A sentence containing sensitive info could be separated from its context. For most use cases this is fine, but for highly secure legal/financial data, it’s something to consider.
The “allow_dangerous_deserialization” Flag: This flag in the query script is necessary for FAISS to load local files. It’s safe here because *we* created the index file. Never load an index file from an untrusted source with this flag enabled.

How This Fits Into a Bigger Automation System

Your RAG system is now the ‘Long-Term Memory’ for any AI automation you build. It’s the knowledge core.

Connect to Slack/Teams: You can easily wrap this query script in a simple bot framework to allow employees to ask questions directly in your company chat.
Power AI Agents: An AI agent tasked with writing a weekly report can first use this RAG system to gather all the relevant project updates from the week’s meeting notes.
Automate Customer Emails: An automation could read an incoming support email, use the RAG system to find the answer in the technical manuals, and then pass that answer to another AI to draft a reply.

You’ve moved from a generic AI to one that is a true expert in *your* specific domain.

What to Learn Next

Okay, this is huge. You’ve built an AI brain that runs locally, and you’ve given it a library of your own documents to read. It can now reason and recall information with perfect clarity.

But it’s still stuck inside its box. It can’t *do* anything in the real world. It can’t browse a website to get live information, it can’t check your calendar, it can’t call an API to fetch real-time stock data.

In the next lesson of the course, we’re giving our AI a toolkit. We will transform our knowledgeable assistant into a functional agent that can interact with the outside world by using tools, like browsing the web or executing code. The brain is smart, the library is full. Next, we give it hands.

“,
“seo_tags”: “RAG, Ollama, LangChain, Local AI, Private AI, Knowledge Management, Python, Vector Database”,
“suggested_category”: “AI Automation Courses