Llama-2 is a household of open-source LLMs launched by Meta. Llama-2 7B is the smallest mannequin of this household by way of parameter rely. The “chat” variant of Llama-2 7B is optimized for chatbot-like dialogue use instances. That is notably helpful for functions that contain conversations because it’s optimized to generate responses in a conversational context, making it notably helpful for functions like chatbots or digital assistants. The Llama-2 7B chat mannequin is smaller and sooner than its counterparts within the Llama-2 household, making it a sensible choice for pace and cost-efficiency on the expense of some accuracy.
Positive-tuning LLMs basically means taking a pre-trained mannequin like Llama-2 that has been already been skilled on an enormous datasets and making minor adjustments to the weights of the trainable parameters of this mannequin to optimize its efficiency on a brand new, particular activity or dataset. Through the technique of fine-tuning, the general structure of the pre-trained Llama-2 mannequin stays unchanged since solely a small set of parameters’ weights is modified to study the vital options of the coaching dataset.
Positive-tuning gives a number of benefits:
- Price-effective and environment friendly: Coaching a LLM from scratch could be extraordinarily time-consuming and computationally costly. Therefore, fine-tuning is a good various because it makes use of a pre-trained mannequin and builds on this, considerably lowering the time and compute assets whereas reaching good outcomes.
- Improved efficiency: Since pre-trained LLMs are already skilled on large quantities of information (~ 2 trillion tokens for Llama-2), by fine-tuning a pre-trained mannequin, we are able to benefit from this information to enhance efficiency on our new, particular activity or dataset.
This tutorial relies on this Google Colab pocket book discovered here, the place you may run all of the cells sequentially and get your private fine-tuned Llama-2 chatbot!
On this tutorial, we’ll be utilizing the Nvidia T4 GPU with 16 GB of VRAM that’s supplied within the free model of Google Colab. In the event you’re operating the pocket book by yourself GPU, that’s works too! The code under will routinely hook up with the T4 GPU if operating it on Colab, or the primary GPU (in case you’ve a number of GPUs) in case you’re operating it elsewhere.
!pip set up GPUtilimport torch
import GPUtil
import os
GPUtil.showUtilization()
if torch.cuda.is_available():
print("GPU is obtainable!")
else:
print("GPU not obtainable.")
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0" # Set to the GPU ID (0 for T4)
Now that you simply’ve established your GPU connection, it’s time to put in (and import) the required libraries for fine-tuning.
!pip set up git+https://github.com/huggingface/peft.git
!pip set up speed up
!pip set up -i https://pypi.org/easy/ bitsandbytes
!pip set up transformers==4.30
!pip set up datasets
import torch
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import BitsAndBytesConfig,LlamaTokenizer
from huggingface_hub import notebook_login
from datasets import load_dataset
from peft import prepare_model_for_kbit_training
from peft import LoraConfig, get_peft_model
from datetime import datetimeif 'COLAB_GPU' in os.environ:
from google.colab import output
output.enable_custom_widget_manager()
Since Llama-2 is ruled by the Meta license, to obtain the mannequin weights and tokenizer, please go to Meta’s website to simply accept their license and request entry for his or her fashions in HuggingFace (often ought to take lower than a day to get entry).
When you’ve gotten entry to the Llama-2 fashions, log in to HuggingFace to enter the write entry token when prompted to load the mannequin in your pocket book.
if 'COLAB_GPU' in os.environ:
!huggingface-cli login
else:
notebook_login()
Having accomplished our setup, it’s time to load our mannequin (Llama-2 7B Chat) utilizing QLoRA
(quantization of parameter weights to 4 bits) to scale back reminiscence necessities and improve coaching pace, whereas guaranteeing that we don’t attain the bottleneck of the 16GB GPU reminiscence.
Be aware: Within the code under, we load all trainable parameters within the 4-bit normal-float (nf4
) datatype and use double quantization to additional reminiscence financial savings. Nevertheless, our computational precision is 16-bits (bfloat16
) since we would like speedup compute of hidden states because the default datatype is float32
.
base_model_id = "meta-llama/Llama-2-7b-chat-hf"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)mannequin = AutoModelForCausalLM.from_pretrained(base_model_id,
quantization_config=bnb_config)
Most of our personal knowledge is in unstructured codecs, like textual content recordsdata or pdfs. Whereas reformatting this to structured knowledge like JSON
or CSV
recordsdata might lead to higher coaching outcomes since there’s a clear mapping between question-answer pairs, nevertheless this format is labor intensive and solely best for eventualities the place knowledge is solely Q&A pairs neatly organized and follows a predictable construction. We perceive this and therefore this tutorial focuses on fine-tuning Llama-2 solely on knowledge in unstructured .txt
recordsdata!
Since Llama-2 has been skilled on knowledge till July 2023, for this tutorial we’ll be utilizing the knowledge in regards to the Hawaii wildfires in August 2023 sourced from the report of the Maui Police division discovered here. We’ve copied the information of the PDF into a number of textual content recordsdata with none extra formatting.
We’ll clone the GitHub repository containing the textual content recordsdata, and cargo them as coaching knowledge.
!git clone https://github.com/poloclub/Positive-tuning-LLMs.git
train_dataset = load_dataset("textual content", data_files={"prepare":
["hawaii_wf_1.txt", "hawaii_wf_2.txt",
"hawaii_wf_3.txt","hawaii_wf_4.txt",
"hawaii_wf_5.txt","hawaii_wf_6.txt",
"hawaii_wf_7.txt","hawaii_wf_8.txt",
"hawaii_wf_9.txt","hawaii_wf_10.txt",
"hawaii_wf_11.txt"]}, cut up='prepare')
Having loaded our knowledge, we’ll should tokenize (break down sequences of textual content into smaller elements or “tokens”) this coaching knowledge earlier than passing this into Llama-2 to fine-tune it. We’ll initialize the LlamaTokenizer
with the pre-trained Llama-2–7B-chat mannequin and manually set the EoS
token in order that the mannequin is aware of the way to acknowledge the “finish of sentence” and the PAD
token to pad shorter traces to match the size of longer ones, for the reason that LlamaTokenizer
is understood to have points with this.
tokenizer = LlamaTokenizer.from_pretrained(base_model_id, use_fast=False,
trust_remote_code=True,
add_eos_token=True)if tokenizer.pad_token is None:
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
# set the pad token to point that it is the end-of-sentence
tokenizer.pad_token = tokenizer.eos_token
Our tokenizer is configured, which implies it’s now time to tokenize our coaching knowledge!
tokenized_train_dataset=[]
for phrase in train_dataset:
tokenized_train_dataset.append(tokenizer(phrase['text']))
We’re one step away from coaching the mannequin! We have to allow gradient checkpointing to commerce computation time for decrease reminiscence utilization throughout coaching. We then setup our LoRA
configuration to scale back the variety of trainable parameters which might considerably scale back the reminiscence and time required for fine-tuning. LoRA
works by decomposing the massive matrix of the pre-trained mannequin into two smaller low-rank matrices within the consideration layers which drastically reduces the variety of parameters that must be fine-tuned. Confer with the LoRA documentation to study extra in regards to the parameters and use instances.
mannequin.gradient_checkpointing_enable()
mannequin = prepare_model_for_kbit_training(mannequin)config = LoraConfig(
# rank of the replace matrices
# Decrease rank leads to smaller matrices with fewer trainable params
r=8,
# impacts low-rank approximation aggressiveness
# growing worth hurries up coaching
lora_alpha=64,
# modules to use the LoRA replace matrices
target_modules=[
"q_proj",
"k_proj",
"v_proj",
"gate_proj",
"down_proj",
"up_proj",
"o_proj"
],
# determines LoRA bias sort, influencing coaching dynamics
bias="none",
# regulates mannequin regularization; growing might result in underfitting
lora_dropout=0.05,
task_type="CAUSAL_LM",
)
mannequin = get_peft_model(mannequin, config)
It’s lastly time to coach our Llama-2 mannequin on our new knowledge (yay!). We’ll be utilizing the Transformers
library to create a Trainer
object for coaching the mannequin. The Coach
takes the pre-trained mannequin (Llama-2 7B chat), coaching datasets, coaching arguments (outlined under), and knowledge collator as enter.
Coaching time depends upon the scale of the coaching knowledge, variety of epochs and the configuration of the GPU used. In the event you use the pattern Hawaii wildfire dataset supplied and run the pocket book on Google Colab’s T4 GPU, then it ought to take round 1 hour half-hour to finish coaching for 3 epochs.
Once you’re fine-tuning in your personal knowledge, we extremely suggest you to change the coaching parameters, notably studying fee and variety of epochs, to attain the great efficiency of the fine-tuned mannequin. Whereas doing this, watch out for overfitting!
Remember the fact that growing the studying fee may result in sooner convergence, but it surely may overshoot the optimum answer. Conversely, a decrease worth might lead to slower coaching however higher fine-tuning. Additionally, growing the variety of epochs might enable the mannequin to study extra from the information, however this may increasingly result in overfitting.
coach = transformers.Coach(
mannequin=mannequin, # llama-2-7b-chat mannequin
train_dataset=tokenized_train_dataset, # coaching knowledge that is tokenized
args=transformers.TrainingArguments(
output_dir="./finetunedModel", # listing the place checkpoints are saved
per_device_train_batch_size=2, # variety of samples processed in a single ahead/backward go per GPU
gradient_accumulation_steps=2, # [default = 1] variety of updates steps to build up the gradients for
num_train_epochs=3, # [IMPORTANT] variety of occasions of full go by your entire coaching dataset
learning_rate=1e-4, # [IMPORTANT] smaller LR for higher finetuning
bf16=False, # prepare parameters with this precision
optim="paged_adamw_8bit", # use paging to enhance reminiscence administration of default adamw optimizer
logging_dir="./logs", # listing to avoid wasting coaching log outputs
save_strategy="epoch", # [default = "steps"] retailer after each iteration of a datapoint
save_steps=50, # save checkpoint after variety of iterations
logging_steps = 10 # specify frequency of printing coaching loss knowledge
),# use to type a batch from an inventory of components of train_dataset
data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, multilevel marketing=False),
)
# if use_cache is True, previous key values are used to hurry up decoding
# if relevant to mannequin. This defeats the aim of finetuning
mannequin.config.use_cache = False
# prepare the mannequin primarily based on the above config
coach.prepare()
In the event you’ve reached this far, congratulations! You’ve efficiently fine-tuned Llama 2 by yourself knowledge. Now, let’s load the finetuned mannequin utilizing the BitsAndBytesConfig
we used beforehand. Guarantee to decide on the mannequin checkpoint with the least coaching loss.
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import BitsAndBytesConfig,LlamaTokenizer
from peft import PeftModelbase_model_id = "meta-llama/Llama-2-7b-chat-hf"
nf4Config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
tokenizer = LlamaTokenizer.from_pretrained(base_model_id, use_fast=False,
trust_remote_code=True,
add_eos_token=True)
base_model = AutoModelForCausalLM.from_pretrained(
base_model_id, #similar as earlier than
quantization_config=nf4Config, #similar quantization config as earlier than
device_map="auto",
trust_remote_code=True,
use_auth_token=True
)
tokenizer = LlamaTokenizer.from_pretrained(base_model_id, use_fast=False,
trust_remote_code=True)
modelFinetuned = PeftModel.from_pretrained(base_model,
"finetunedModel/checkpoint-1455")
It’s now time to check your private fine-tuned Llama 2! Under are some instance responses when requested questions in regards to the some particulars of the 2023 Hawaii Wildfires.
### ENTER YOUR QUESTION BELOWquery = "Simply reply this query: Inform me in regards to the function of Maui Emergency Administration Company (MEMA) within the 2023 wildfires??"
# Format the query
eval_prompt = f"{query}nn"
promptTokenized = tokenizer(eval_prompt, return_tensors="pt").to("cuda")
modelFinetuned.eval()
with torch.no_grad():
print(tokenizer.decode(modelFinetuned.generate(**promptTokenized, max_new_tokens = 1024)[0], skip_special_tokens=True))
torch.cuda.empty_cache()
Simply reply this query: Inform me in regards to the function of Maui Emergency Administration Company (MEMA) within the 2023 wildfires??MEMA performed a vital function in coordinating and facilitating the response to the 2023 wildfires. MEMA’s main function is to coordinate and facilitate the response to emergencies and disasters in Maui County.
Within the case of the 2023 wildfires, MEMA’s obligations included:
1. Coordinating with different emergency response businesses, such because the Maui Police Division, Maui Fireplace Division, and the Hawaii Nationwide Guard, to make sure a unified response.
2. Activating the Maui Emergency Operations Middle (EOC), which served because the centralized location for coordination of response and assist actions.
3. Offering situational experiences and knowledge to the County of Maui Mayor’s workplace and different stakeholders.
4. Coordinating with media shops to make sure correct and well timed dissemination of data to the general public.
5. Managing and coordinating volunteer and donated assets.
6. Coordinating with different authorities businesses, such because the Federal Emergency Administration Company (FEMA), the American Purple Cross, and different assist businesses.
7. Offering assist to emergency shelters and different response efforts.
8. Coordinating with the Maui County Company Counsel’s workplace to make sure authorized assist for emergency response and restoration efforts.
9. Coordinating with the Maui Police Division’s Wailuku and Lanai districts to make sure assist for the Wailuku and Lanai communities.
10. Coordinating with the Maui Fireplace Division’s Wailuku and Lanai stations to make sure assist for the Wailuku and Lanai hearth items.
In abstract, MEMA performed a vital function in coordinating and facilitating the response to the 2023 wildfires in Maui County. MEMA’s obligations included coordinating with different emergency response businesses,
activating the Maui EOC, offering situational experiences and knowledge, managing and coordinating volunteer and donated assets,
and offering assist to emergency shelters and different response efforts.
One other instance:
# Consumer enters query under
user_question = "When did the Hawaii wildfires happen?"# Format the query
eval_prompt = f"Query: {user_question}. Simply reply this query precisely and conciselynn"
promptTokenized = tokenizer(eval_prompt, return_tensors="pt").to("cuda")
modelFinetuned.eval()
with torch.no_grad():
print(tokenizer.decode(modelFinetuned.generate(**promptTokenized, max_new_tokens = 1024)[0], skip_special_tokens=True))
torch.cuda.empty_cache()
Query: When did the Hawaii wildfires happen?. Simply reply this query preciselyReply: The Hawaii wildfires occurred from August 8, 2023 to August 12, 2023.
We are able to see from the above examples that the mannequin performs very properly and demonstrates a powerful understanding of in regards to the 2023 Wildfire incident!
This brings us to the top of the tutorial! Be at liberty to tinker round with the pocket book and fine-tune your private Llama-2 chatbot in your personal knowledge and have enjoyable 🙂
Credit