Phi3 is a very small large language model, with only 4 billion parameters compared to ChatGPT's 175 billion. Despite its smaller size, Phi3 offers a unique advantage: it can directly interact with your data without needing an API. In contrast, ChatGPT requires an API, and the free access to this API expires after three months unless you upgrade to a paid plan at $25 per month.
One of the main benefits of Ollama hosted Phi3 is that it doesn't require internet access and is completely free. However, running Phi3 locally does have its drawbacks. For optimal performance, it requires a high-spec PC or Mac, particularly one with a powerful GPU. While it can run on lower-spec machines, the performance will be significantly slower.
To demonstrate how Phi3 works, I have prepared a simple program that uploads a text file, processes it, and allows you to ask questions about the data. For this example, I used data about the national anthem of the Philippines. When I asked the untrained Phi3 who wrote the national anthem, it provided an incorrect answer. This is due to the limited amount of training data available for Phi3, as mentioned earlier. The following picture shows the reply of Phi3(without embeddings) from the DOS prompt:
I obtained the data from Wikipedia by just copying a few paragraphs and pasting it to Notepad and saving it as .txt file.
Here is the program and some explanations:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | from langchain_chroma import Chroma from langchain_community.document_loaders import TextLoader from langchain_community.embeddings.sentence_transformer import ( SentenceTransformerEmbeddings, ) from langchain_text_splitters import CharacterTextSplitter from langchain_community.embeddings import OllamaEmbeddings from langchain_text_splitters import RecursiveCharacterTextSplitter from langchain_community.vectorstores import Chroma from langchain.prompts import ChatPromptTemplate, PromptTemplate from langchain_core.output_parsers import StrOutputParser from langchain_community.chat_models import ChatOllama from langchain_core.runnables import RunnablePassthrough from langchain.retrievers.multi_query import MultiQueryRetriever loader = TextLoader("lupanghinirang.txt") documents = loader.load() # Split the document into chunks text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0) docs = text_splitter.split_documents(documents) # Create the open-source embedding function embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2") # Load it into Chroma db = Chroma.from_documents(docs, embedding_function) # Query the database query = "What is the National Anthem of the Philippines?" docs = db.similarity_search(query) # Split and chunk the documents text_splitter = RecursiveCharacterTextSplitter(chunk_size=7500, chunk_overlap=100) chunks = text_splitter.split_documents(docs) # Add to vector database vector_db = Chroma.from_documents( documents=chunks, embedding=OllamaEmbeddings(model="nomic-embed-text", show_progress=True), collection_name="local-rag" ) # LLM from Ollama local_model = "phi3" llm = ChatOllama(model=local_model) QUERY_PROMPT = PromptTemplate( input_variables=["question"], template="""You are an AI language model assistant. Your task is to generate five different versions of the given user question to retrieve relevant documents from a vector database. By generating multiple perspectives on the user question, your goal is to help the user overcome some of the limitations of the distance-based similarity search. Provide these alternative questions separated by newlines. Original question: {question}""", ) retriever = MultiQueryRetriever.from_llm( vector_db.as_retriever(), llm, prompt=QUERY_PROMPT ) # RAG prompt template = """Answer the question based ONLY on the following context: {context} Question: {question} """ prompt = ChatPromptTemplate.from_template(template) chain = ( {"context": retriever, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser() ) chain.invoke("who composed the national anthem of the Philippines?") |
Explanation
Loading and Splitting Documents:
- The program starts by loading a text file (
lupanghinirang.txt
) usingTextLoader
. - The text is then split into chunks using
CharacterTextSplitter
.
- The program starts by loading a text file (
Embedding and Storing Data:
- An embedding function is created using the
SentenceTransformerEmbeddings
model. - The split documents are embedded and stored in a Chroma database.
- An embedding function is created using the
Querying the Database:
- A query is made to the database to find relevant documents.
- The retrieved documents are further split into chunks using
RecursiveCharacterTextSplitter
.
Adding to Vector Database:
- The chunks are embedded using
OllamaEmbeddings
and stored in another Chroma database.
- The chunks are embedded using
Setting Up the Language Model:
- The
ChatOllama
model (Phi3) is initialized. - A prompt template is created to generate multiple versions of a user query.
- The
Retrieving Relevant Documents:
MultiQueryRetriever
is used to retrieve relevant documents from the vector database.
Answering the Query:
- A final prompt template is set up to answer the question based on the retrieved context.
- The chain is executed to generate the answer to the question.
Here is the result when running the program:
This program demonstrates the capabilities of Phi3 in processing and interacting with local data. Despite its smaller size compared to ChatGPT, Phi3 can be a powerful tool for specific applications where internet access is limited or data privacy is a concern.
No comments:
Post a Comment