
Have you used OpenAI’s ChatGPT or Google’s Gemini or Microsoft’s Co-pilot?
Are you concerned about sending your data to these cloud-based LLM models? Are concerned about data privacy and security, but want to leverage the power of LLM models?
Wait, have you thought of hosting a copy of LLM model on your server, so you are in charge of the data?
Yes, privately hosted LLM on your server is your answer.
Having a private chatbot offers increased data security and customization compared to cloud-based solutions. However, it also requires some technical expertise.
This guide will walk you through implementing and deploying your own private conversational chatbot powered by Llama3 LLM model, an open-source LLM model by Meta, on your server.
Pre-requisites
Hardware Requirements
CPU: Modern CPU with at least 8 cores is recommended for efficient backend operations and data preprocessing.
GPU: One or more powerful GPUs, preferably Nvidia with CUDA architecture, recommended for model training and inference. RTX 3000 series or higher is ideal.
RAM: Minimum 16 GB for 8B model and 32 GB or more for 70B model.
Software Requirements
Operating System: Compatible with Linux and Windows operating systems. For this guide, we will be using Ubuntu 20.04.
Software Dependencies:
Python: Recent versions, typically Python 3.7 or higher, to ensure compatibility with all necessary libraries. For this guide, we will be using Python3.11.
Setup environment
Creating a virtual environment in your project directory:
python3.11 -m venv llama3_env
Activate the virtual environment.
source .llama3_env/bin/activate
Download Llama3 models
You can either download llama3 model from Meta or from HuggingFace website.
There are two formats of the model available:
native llama3 format
transformers format
There are different models to choose from for Llama3: (make a table)
Llama3-8b: The base 8 billion parameter model.
Llama3-8b-instruct: The 8 billion parameter model that has been fine-tuned for instruction-following and dialogue use cases.
Llama3-70b: The base 70 billion parameter model.
Llama3-70b-instruct: The 70 billion parameter model that has been fine-tuned for instruction-following and dialogue use cases.
Llama Guard 2: A separate model that has been fine-tuned on the Llama3-8b model to classify inputs and outputs for safety and content moderation.
For this guide, we will be using the Llama3-8B-Instruct model version to use with transformers format and we will download it from HuggingFace website.
Install hugging-face cli
pip install huggingface-hub
Download from HuggingFace
Visit this URL: https://huggingface.co/meta-llama/Meta-Llama-3-8B
Submit download request. Once the request is approved you can start the download.
Download the transformers format of Llama3 model wing the following command:
huggingface-cli download meta-llama/Meta-Llama-3-8B-Instruct --exclude "original/*" --local-dir meta-llama/Meta-Llama-3-8B-Instruct
Implement chatbot powered by Llama3
Install python libraries
Install the necessary libraries using the following command:
pip install torch
pip install transformers
Code
Create a new file with filename chatbot.py. Copy the following code and paste it to the newly created file.
import transformers
import torch
class Chatbot:
def __init__(self, model_path):
self.model_id = model_path
# Download and cache the weights of the model
self.pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.float16},
device="cuda",
)
self.terminators = [
self.pipeline.tokenizer.eos_token_id,
self.pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>"),
]
def get_response(
self, query, message_history=[], max_tokens=256, temperature=0.1, top_p=0.9
):
user_prompt = message_history + [{"role": "user", "content": query}]
prompt = self.pipeline.tokenizer.apply_chat_template(
user_prompt, tokenize=False, add_generation_prompt=True
)
outputs = self.pipeline(
prompt,
max_new_tokens=max_tokens,
eos_token_id=self.terminators,
do_sample=True,
temperature=temperature,
top_p=top_p,
)
response = outputs[0]["generated_text"][len(prompt):]
return response, user_prompt + [{"role": "assistant", "content": response}]
def chatbot(self, system_instructions=""):
conversation = [{"role": "system", "content": system_instructions}]
while True:
user_input = input("User: ")
if user_input.lower() in ["exit", "quit"]:
print("Goodbye for now!")
break
response, conversation = self.get_response(user_input, conversation)
print(f"Assistant: {response}")
if __name__ == "__main__":
# Path to model
model_id = "meta/Meta-Llama-3-8B-Instruct"
bot = Chatbot(model_id)
system_instructions="You are a legal research assistant chatbot who always responds in legal language!"
bot.chatbot(system_instructions)
The code above creates a conversational chatbot, and we pass it a system instruction to behave like a legal research assistant.
Run the chatbot
Use the following command to run the conversational chatbot on your server.
python chatbot.py
Now you can chat with your legal research asssistant chatbot that is running on your server.
Chatbot-UI
Implementing chatbot-UI is out of scope of this blog. I will create a separate blog on how to create a chatbot-UI.
In this blog, you learned how to download the transformers version of Llama3 model from HuggingFace, and implement a conversational chatbot that acts as a legal research assistant on your private server.
Subscribe now, to stay tuned and get notified when a new blog gets published.
References