QnA bots anywhere Llama 2 – private GPT

Introduction

In the ever-evolving landscape of artificial intelligence, having access to powerful language models has become crucial for a wide range of applications. Thanks to META’s groundbreaking decision to make Llama 2 open source and available for commercial use, developers now have the freedom to harness the potential of this advanced language model without any external dependencies. In this blog post, we will explore how Llama 2 has paved the way for on-premise usage and its potential to revolutionize the field of private AI.

What is Llama 2?

Llama 2, developed by META, is an advanced language model that has been made open source for developers and businesses alike. With this move, META has enabled the community to download the model and access the code for various Llama (LLM) use cases on-premise. Furthermore, developers can also utilize the same code base in the public cloud, offering them a range of flexible options to work with.

Independence from External Services

One of the primary advantages of adopting Llama 2 is its independence from platforms like OpenAI or Palm2. With Llama 2 available offline, developers no longer need to rely on external services for their language processing needs. This autonomy not only streamlines the development process but also eliminates the requirement of sending sensitive data to third-party cloud endpoints. By keeping data processing in-house, users gain more control over their data, enhancing privacy and security.

Comparing Llama 2 to OpenAI Offerings

A crucial aspect that makes Llama 2 even more attractive is its competitiveness with leading language models, including those from OpenAI. For users who may wonder about the capabilities of Llama 2 compared to OpenAI’s GPT-4, there is a detailed comparison available at “openaimaster.com/llama-2-vs-gpt-4.” The insights provided in this comparison highlight Llama 2’s potential and make it evident that it can stand toe-to-toe with some of the best language models in the market.

Real-World Testing and Accuracy

To further understand the capabilities of Llama 2, a real-world test was conducted using a 200-page PDF document. The results were highly accurate, demonstrating the effectiveness and precision of Llama 2 for complex language processing tasks. As the model can be downloaded, developers can now achieve such impressive results without any reliance on external servers.

Conclusion

With the availability of Llama 2 as an open-source and commercially usable language model, developers now have the freedom to innovate and experiment with advanced natural language processing. This breakthrough from META empowers users to utilize Llama 2 on-premise or in the public cloud, without relying on external AI services. The model’s remarkable accuracy, combined with the ease of implementation and data privacy benefits, positions Llama 2 for widespread adoption in various industries. As the journey of private AI progresses, Llama 2 serves as a beacon of hope, unlocking the potential for developers to create their own private GPT solutions.

Code snippet

you can use the code below to test out Llama 2 trained on your document. It uses FAISS as vector DB. I have extended it to support multiple use cases. You each deployment has a separate index file in FAISS and you can have multiple bots using the same APIs for QnA. The llama model is used from here to support CPU only systems for POC. https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/blob/main/llama-2-7b-chat.ggmlv3.q8_0.bin

build your vector database.

from langchain.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import DirectoryLoader
from langchain.embeddings import HuggingFaceEmbeddings


# Build vector database
def build_faiss_db(data_path, faiss_path):
    loader = DirectoryLoader(data_path)
    documents = loader.load()
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
    texts = text_splitter.split_documents(documents)

    embeddings = HuggingFaceEmbeddings(
        model_name="sentence-transformers/all-MiniLM-L6-v2",
        model_kwargs={"device": "cpu"},
    )

    vectorstore = FAISS.from_documents(texts, embeddings)
    vectorstore.save_local(faiss_path)


if __name__ == "__main__":
    pass

load the model

from langchain.llms import CTransformers
from langchain.llms import CTransformers


def build_llm():
    # Local CTransformers model
    llm = CTransformers(
        model="models/llama-2-7b-chat.ggmlv3.q8_0.bin",
        model_type="llama",
        config={"max_new_tokens": 256, "temperature": 0.01},
    )

    return llm

utilities

from langchain import PromptTemplate
from langchain.chains import RetrievalQA
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from src.llm import build_llm


def set_qa_prompt(qa_template=None):
    """
    Prompt template for QA retrieval for each vectorstore
    """
    if not qa_template:
        qa_template = """Use the following pieces of information to answer the user's question.
        If you don't know the answer, just say that you don't know, don't try to make up an answer.
        Context: {context}
        Question: {question}
        Only return the helpful answer below and nothing else.
        Helpful answer:
        """
    prompt = PromptTemplate(
        template=qa_template, input_variables=["context", "question"]
    )
    return prompt


def build_retrieval_qa(llm, prompt, vectordb):
    dbqa = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=vectordb.as_retriever(search_kwargs={"k": 2}),
        return_source_documents=False,
        chain_type_kwargs={"prompt": prompt},
    )
    return dbqa


def setup_dbqa(faiss_path):
    embeddings = HuggingFaceEmbeddings(
        model_name="sentence-transformers/all-MiniLM-L6-v2",
        model_kwargs={"device": "cpu"},
    )
    vectordb = FAISS.load_local(faiss_path, embeddings)
    llm = build_llm()
    qa_prompt = set_qa_prompt()
    dbqa = build_retrieval_qa(llm, qa_prompt, vectordb)

    return dbqa


main code
# Import necessary libraries
from flask import Flask, request
import os
import shutil
from build_vector_db import build_faiss_db
from src.utils import setup_dbqa
from werkzeug.utils import secure_filename

# Create the Flask app
app = Flask(__name__)


# Define a route and corresponding view function
@app.route("/train", methods=["POST"])
def train():
    user_id = request.form["user_id"]
    deployment_id = request.form["deployment_id"]
    USER_UPLOAD_FILE_DIR = f"uploaded_files/{user_id}/{deployment_id}"
    USER_PROCESSED_FILE_DIR = f"processed_files/{user_id}/{deployment_id}"
    FAISS_INDEX_DIR = f"faiss_index_files/{user_id}/{deployment_id}"

    if not os.path.exists(FAISS_INDEX_DIR):
        os.makedirs(FAISS_INDEX_DIR)

    build_faiss_db(USER_UPLOAD_FILE_DIR, FAISS_INDEX_DIR)

    for filename in os.listdir(USER_UPLOAD_FILE_DIR):
        # Move the file to USER_PROCESSED_DIR
        file_path = os.path.join(USER_UPLOAD_FILE_DIR, filename)
        if not os.path.exists(USER_PROCESSED_FILE_DIR):
            os.makedirs(USER_PROCESSED_FILE_DIR)
        new_file_path = os.path.join(USER_PROCESSED_FILE_DIR, filename)
        shutil.move(file_path, new_file_path)

    return "Training Successfully Done"


# Define a route and corresponding view function
@app.route("/upload", methods=["POST"])
def upload():
    user_id = request.form["user_id"]
    deployment_id = request.form["deployment_id"]
    files = request.files.getlist("files")
    USER_DIR = f"uploaded_files/{user_id}"
    DEPLOYMENT_DIR = f"{USER_DIR}/{deployment_id}"
    if files[0].filename == "":
        return "Please select the pdf file."
    else:
        if not os.path.exists(DEPLOYMENT_DIR):
            os.makedirs(DEPLOYMENT_DIR)

        for file in files:
            filename = secure_filename(file.filename)
            file.save(f"{DEPLOYMENT_DIR}/{filename}")
        return "files saved successfully"


# Define a route and corresponding view function
@app.route("/query", methods=["POST"])
def query():
    user_id = request.form["user_id"]
    deployment_id = request.form["deployment_id"]
    question = request.form["question"]
    FAISS_INDEX_DIR = f"faiss_index_files/{user_id}/{deployment_id}"
    dbqa = setup_dbqa(FAISS_INDEX_DIR)
    response = dbqa({"query": question})
    return response["result"]


# Define a route and corresponding view function
@app.route("/remove", methods=["POST"])
def delete():
    user_id, deployment_id = request.form["user_id"], request.form["deployment_id"]
    FAISS_INDEX_DIR = f"faiss_index_files/{user_id}/{deployment_id}"
    USER_PROCESSED_FILE_DIR = f"processed_files/{user_id}/{deployment_id}"
    os.remove(FAISS_INDEX_DIR)
    os.remove(USER_PROCESSED_FILE_DIR)
    return "Successfully Deleted"


# Run the app if this script is executed directly
if __name__ == "__main__":
    app.run(debug=True)

One thought on “QnA bots anywhere Llama 2 – private GPT

  1. Pingback: Exploring the Power of Language Models for QnA and Chatbots: A Comprehensive Guide | Azure, AWS, .NET , DevOps , AI/ML

Leave a comment