Wednesday, August 7, 2024

Interact with Your Own Data Using Ollama Hosted Phi3

 Phi3 is a very small large language model, with only 4 billion parameters compared to ChatGPT's 175 billion. Despite its smaller size, Phi3 offers a unique advantage: it can directly interact with your data without needing an API. In contrast, ChatGPT requires an API, and the free access to this API expires after three months unless you upgrade to a paid plan at $25 per month.

One of the main benefits of Ollama hosted Phi3 is that it doesn't require internet access and is completely free. However, running Phi3 locally does have its drawbacks. For optimal performance, it requires a high-spec PC or Mac, particularly one with a powerful GPU. While it can run on lower-spec machines, the performance will be significantly slower.

To demonstrate how Phi3 works, I have prepared a simple program that uploads a text file, processes it, and allows you to ask questions about the data. For this example, I used data about the national anthem of the Philippines. When I asked the untrained Phi3 who wrote the national anthem, it provided an incorrect answer. This is due to the limited amount of training data available for Phi3, as mentioned earlier. The following picture shows the reply of Phi3(without embeddings) from the DOS prompt:


I obtained the data from Wikipedia by just copying a few paragraphs and pasting it to Notepad and saving it as .txt file.

Here is the program and some explanations:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
from langchain_chroma import Chroma
from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings.sentence_transformer import (
    SentenceTransformerEmbeddings,
)
from langchain_text_splitters import CharacterTextSplitter

from langchain_community.embeddings import OllamaEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma

from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.chat_models import ChatOllama
from langchain_core.runnables import RunnablePassthrough
from langchain.retrievers.multi_query import MultiQueryRetriever

loader = TextLoader("lupanghinirang.txt")
documents = loader.load()

# Split the document into chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

# Create the open-source embedding function
embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

# Load it into Chroma
db = Chroma.from_documents(docs, embedding_function)

# Query the database
query = "What is the National Anthem of the Philippines?"
docs = db.similarity_search(query)

# Split and chunk the documents
text_splitter = RecursiveCharacterTextSplitter(chunk_size=7500, chunk_overlap=100)
chunks = text_splitter.split_documents(docs)

# Add to vector database
vector_db = Chroma.from_documents(
    documents=chunks, 
    embedding=OllamaEmbeddings(model="nomic-embed-text", show_progress=True),
    collection_name="local-rag"
)

# LLM from Ollama
local_model = "phi3"
llm = ChatOllama(model=local_model)

QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate five
    different versions of the given user question to retrieve relevant documents from
    a vector database. By generating multiple perspectives on the user question, your
    goal is to help the user overcome some of the limitations of the distance-based
    similarity search. Provide these alternative questions separated by newlines.
    Original question: {question}""",
)

retriever = MultiQueryRetriever.from_llm(
    vector_db.as_retriever(), 
    llm,
    prompt=QUERY_PROMPT
)

# RAG prompt
template = """Answer the question based ONLY on the following context:
{context}
Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

chain.invoke("who composed the national anthem of the Philippines?")

Explanation

  1. Loading and Splitting Documents:

    • The program starts by loading a text file (lupanghinirang.txt) using TextLoader.
    • The text is then split into chunks using CharacterTextSplitter.
  2. Embedding and Storing Data:

    • An embedding function is created using the SentenceTransformerEmbeddings model.
    • The split documents are embedded and stored in a Chroma database.
  3. Querying the Database:

    • A query is made to the database to find relevant documents.
    • The retrieved documents are further split into chunks using RecursiveCharacterTextSplitter.
  4. Adding to Vector Database:

    • The chunks are embedded using OllamaEmbeddings and stored in another Chroma database.
  5. Setting Up the Language Model:

    • The ChatOllama model (Phi3) is initialized.
    • A prompt template is created to generate multiple versions of a user query.
  6. Retrieving Relevant Documents:

    • MultiQueryRetriever is used to retrieve relevant documents from the vector database.
  7. Answering the Query:

    • A final prompt template is set up to answer the question based on the retrieved context.
    • The chain is executed to generate the answer to the question.

Here is the result when running the program:



This program demonstrates the capabilities of Phi3 in processing and interacting with local data. Despite its smaller size compared to ChatGPT, Phi3 can be a powerful tool for specific applications where internet access is limited or data privacy is a concern.

No comments:

Post a Comment