3️⃣Fine-tuning Phi-2 with QLoRA

우선 27억 개의 파라미터로 구성된 SLM(Small Language Model)인 Microsoft의 Phi-2 LLM을 살펴보고, 이를 통해 LLM의 미세 조정하는 방법에 대해 알아보겠습니다.

Phi-2와 기본 활용 튜토리얼
LLM Fine-tuning 방법론과 절차 실습

1. What is Phi-2?

Phi-2는 Microsoft Research에서 개발한 27억 개의 매개변수를 가진 언어 모델로 규모가 훨씬 큰 모델에 비해 최첨단 성능을 달성하는 것을 목표로 하는 Microsoft의 소형 언어 모델인 "Phi" 시리즈의 2번째 버전이다.

Phi-2는 트랜스포머 아키텍처를 사용하는 언어 모델로 자연어 처리 및 코딩을 위해 합성 및 웹 데이터 세트의 조합으로부터 1조 4천억 개의 토큰을 학습했다. 인간의 피드백을 통한 강화 학습(RLHF)을 통해 미세 조정되거나 정렬되지 않은 기본 모델이다..=

Phi-2의 개발은 두 가지 핵심 인사이트를 중심으로 이루졌다:

학습 데이터의 품질: '교과서 수준의' 데이터를 강조하는 이 접근 방식은 합성 데이터 세트와 고가치 웹 콘텐츠를 활용하여 상식적인 추론, 일반 지식, 과학, 일상 활동 등에 대해 모델을 가르치는 데 중점을 둔다.
확장된 지식 전달: 13억 개의 매개변수 모델 Phi-1.5의 지식을 27억 개의 매개변수 Phi-2에 포함하면 학습 프로세스가 가속화되고 Phi-2 벤치마크 점수가 향상된다.

Phi-2 compared to other language models

Phi-2는 상식 추론, 언어 이해, 수학 및 코딩을 아우르는 여러 벤치마크에서 Llama-2 및 Mistral과 같은 7B-13B 파라미터 모델의 성능을 뛰어넘는다.. 코딩과 수학 등 다단계 추론이 필요한 작업에서는 25배 더 큰 Llama-2-70B 모델보다 성능이 뛰어나다.

Image Source

모델은 모바일 기기에 직접 설치하여 대형 언어 모델과 유사한 성능을 달성할 수 있다. Phi-2는 빅 벤치 하드, BoolQ, MBPP 벤치마크에서 더 작은 크기에도 불구하고 구글 제미니 나노 2보다 성능이 뛰어나다.

Image Source

2. Accessing the Phi-2 Model

Streaming on GPU

Library Installation

!pip install -q -U transformers
!pip install -q -U accelerate

from transformers import pipeline

model_name = "microsoft/phi-2"

pipe = pipeline(
    "text-generation",
    model=model_name,
    device_map="auto",
    trust_remote_code=True,
)

from IPython.display import Markdown 

prompt = "Please create a Python application that can change wallpapers automatically." 

outputs = pipe( 
	prompt, 
	max_new_tokens=300, 
	do_sample=True, 
	temperature=0.7, 
	top_k=50, 
	top_p=0.95, 
) 
	
Markdown(outputs[0]["generated_text"])

3. Basic Operations

Q&A

outputs = pipe( "Who is the richest person in the world?",max_new_tokens=70)
print(outputs[0]["generated_text"])

Who is the richest person in the world? The richest person in the world is Jeff Bezos, the founder of Amazon. He is worth $137 billion.

Code

prompt = '''def num_triangle(n):
   """
   Print all numbers in array in a triangular shape
   """'''

outputs = pipe(prompt,max_new_tokens=120)

print(outputs[0]["generated_text"])

   """
   Print all numbers in array in a triangular shape
   """
   for i in range(1, n+1):
      for j in range(1, i+1):
         print(j, end=" ")
      print()

Chat

from transformers import pipeline, Conversation 

model_name = "microsoft/phi-2" 

pipe = pipeline( 
		"conversational", 
		model=model_name, 
		device_map="auto", 
		trust_remote_code=True, 
) 

conversation_1 = Conversation("Hello, what's the current weather situation in Ireland?") 
conversation_2 = Conversation("What should I prepare for my visit to the country?") 

chat = pipe([conversation_1, conversation_2]) 

for i in range(len(chat)):
	print("user: ",chat[i].messages[0]["content"].split("<|im_end|>")[0])
	print("assistant: ",chat[i].messages[1]["content"].split("<|im_end|>")[0],"\n")

user: Hello, what's the current weather situation in Ireland?
assistant: The current weather in Ireland is sunny with a high of 25....
user: What should I prepare for my visit to the country?
assistant: You should prepare your passport, visa, and any necessary.....

4. Fine-Tuning Phi-2

Setting up

%%capture
%pip install -U bitsandbytes
%pip install -U transformers
%pip install -U peft
%pip install -U accelerate
%pip install -U datasets
%pip install -U trl

from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import (
    LoraConfig,
    PeftModel,
    prepare_model_for_kbit_training,
    get_peft_model,
)
import os, torch
from datasets import load_dataset
from trl import SFTTrainer

Model: microsoft/phi-2 Dataset: hieunguyenminh/roleplay

base_model = "microsoft/phi-2"
dataset_name = "hieunguyenminh/roleplay"
new_model = "phi-2-role-play"

from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
secret_hf = user_secrets.get_secret("HUGGINGFACE_TOKEN")

!huggingface-cli login --token $secret_hf

Loading the Dataset

# Importing the dataset 100 rows
dataset = load_dataset(dataset_name, split="train[0:1000]")
dataset["text"][100]

Loading Model and Tokenizer

# Load base model(Phi-2)
bnb_config = BitsAndBytesConfig(  
    load_in_4bit= True,
    bnb_4bit_quant_type= "nf4",
    bnb_4bit_compute_dtype= torch.bfloat16,
    bnb_4bit_use_double_quant= False,
)

model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

model.config.use_cache = False
model.config.pretraining_tp = 1


# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

Adding Adopter Layer

model = prepare_model_for_kbit_training(model)
peft_config = LoraConfig(
    r=16,
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=[
        'q_proj',
        'k_proj',
        'v_proj',
        'dense',
        'fc1',
        'fc2',
    ]
)
model = get_peft_model(model, peft_config)

Training the Model

training_arguments = TrainingArguments(
    output_dir="./phi-2-role-play",
    num_train_epochs=1,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=1,
    optim="paged_adamw_32bit",
    save_strategy="epoch",
    logging_steps=100,
    logging_strategy="steps",
    learning_rate=2e-4,
    fp16=False,
    bf16=False,
    group_by_length=True,
    disable_tqdm=False,
    report_to="none",
)

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    max_seq_length= 2048,
    dataset_text_field="text",
    tokenizer=tokenizer,
    args=training_arguments,
    packing= False,
)

trainer.train()

Saving the Model

# Save the fine-tuned model & model push
trainer.model.save_pretrained(new_model)
trainer.push_to_hub(new_model)

Image Source

Model Evaluation

logging.set_verbosity(logging.CRITICAL) 

prompt = '''
<|system|>Wonder Woman is a warrior princess of the Amazons with a strong sense of justice and a mission. 
<|user|> What motivates you to fight for peace and love? 
<|assistant|>
''' 

pipe = pipeline(
		task="text-generation", 
		model=model, 
		tokenizer=tokenizer, 
		max_length=200
) 

result = pipe(prompt) print(result[0]['generated_text'])

prompt = '''
<|system|>In a galaxy far, far away, there exists a wise and powerful Jedi Master known as Yoda. 
<|user|> What is the meaning of love? 
<|assistant|>
''' 

result = pipe(prompt) print(result[0]['generated_text'])

Fine-tuning 예제 실습: QLoRA, PEFT, SFTT

finetuning에 들어가기 전에 먼저 LLM 학습 프로세스를 이해해 보겠습니다.대규모 언어 모델에는 두 가지 주요 프로세스가 있는데 하나는 사전 학습이고 다른 하나는 미세 조정입니다.

Pre-training

Pre-traing은 모델이 라벨링되지 않은 방대한 양의 텍스트 데이터에 노출되는 훈련의 초기 단계입니다. 이 단계에서 모델은 시퀀스의 다음 단어를 예측하거나 누락된 단어를 채워 넣음으로써 언어 내의 구조, 패턴 및 관계를 이해하는 방법을 학습합니다. 이 과정을 통해 모델은 문법, 구문, 의미론에 대한 기초적인 이해를 쌓을 수 있습니다.

Fine-tuning

반면에 fine-tuning이라 불리는 미세조정은 또는 인스트럭션 조정은 사전 학습된 모델을 더 작은 데이터 세트에 대해 추가로 학습시켜 특정 작업이나 도메인에 맞게 지식을 조정하는 프로세스입니다. 이 프로세스는 특정 작업을 수행하기 위해 모델의 매개변수(model parameters)를 조정합니다.

예를 들어, 다양한 웹 기사 집합에 대해 사전 학습된 모델은 의료 질문에 대한 답변 작업에서 즉시 잘 수행하지 못할 수 있습니다. 미세 조정에는 두 가지 방법이 있습니다:

Supervised fine tuning (SFT): 모델은 레이블이 지정된 데이터 세트에 대해 학습됩니다. 레이블이 지정된 데이터 세트에는 일반적으로 작업과 관련된 명령어(입력) 및 응답(출력) 쌍의 예가 포함되어 있습니다. 이 과정에서 모델은 특정 명령에 응답하는 방법을 학습합니다.
Reinforcement Learning from Human Feedback (RLHF): 모델이 사용자와 상호 작용하여 응답을 생성하고 강화 신호의 형태로 피드백을 받습니다. 기본적으로 모델은 수신한 피드백을 기반으로 학습하고 성능을 개선합니다.

LLM Fine-tune with Phi-2

LLM 미세 조정은 계산 비용이 많이 들고 수십억 개의 파라미터 모델을 훈련하기 위해 수백 GB의 VRAM이 필요하므로 일반 사용자의 하드웨어에서 실행하거나 훈련하기에는 큰 어려움이 있습니다. 이 문제를 해결하기 위해 LoRA(Low-Rank Adapters)의 확장인 QLoRA(Quantized Low-Rank Adaptation)라는 parameter-efficient fine-tuning(PEFT) 기술을 사용합니다. 이 기술은 기존 파라미터를 동결(frozen) 하면서 소수의 추가 모델 파라미터를 미세 조정하여 메모리 사용량을 줄입니다. 이를 통해 고성능을 유지하면서 일반적으로 4비트 정밀도로 모델을 실행할 수 있습니다.

이제 간단하게 Hugginface에 있는 microsoft/phi-2를 base_model로 해서 나만의 데이터로 fine-tuning해서 새로운 모델인 phi-2-loudai 를 만들어 보겠습니다.

Setup Environments

%pip install -q -U accelerate peft bitsandbytes transformers trl einops

import os
import torch
from datasets import load_dataset
from datasets import load_from_disk
from peft import LoraConfig, prepare_model_for_kbit_training, PeftModel
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    AutoTokenizer,
    TrainingArguments,
    pipeline,
    logging,
)

from trl import SFTTrainer

Model, Dataset, Tokenizer

기본 모델을 설정하고, 데이터 세트를 로드하고, 각 샘플에 대해 균일한 토큰 크기를 보장하도록 Tokenizer를 구성합니다.

# Model
base_model = "microsoft/phi-2"
new_model = "phi-2-loudai"

# Dataset
dataset = load_dataset(
    "prsdm/MedQuad-phi2-1k", 
    split="train"
)

# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    base_model, 
    use_fast=True
)
tokenizer.pad_token=tokenizer.eos_token
tokenizer.padding_side="right"

Downloading readme: 100%|██████████| 274/274 [00:00<00:00, 672kB/s]
Downloading data: 100%|██████████| 1.61M/1.61M [00:00<00:00, 1.62MB/s]
Generating train split: 100%|██████████| 1000/1000 [00:00<00:00, 31695.79 examples/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

다음으로, Phi-2 매개변수의 양자화를 활성화하도록 bitsandbytes를 구성합니다. 여기에는 4비트 양자화 유형, 계산 데이터 유형 등과 같은 매개변수를 지정하는 작업이 포함됩니다.

QLoRa Configuration

Quantization 양자화에 대한 configuration을 설정합니다.

# Quantization configuration
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=False,
)

AutoModelForCausalLM 함수로 pretrained 기본 LLM을 지정합니다.

# Load base moodel
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=bnb_config,
    trust_remote_code=True,
    low_cpu_mem_usage=True,
    device_map={"": 0},
    revision="refs/pr/23" #the main version of Phi-2 doesn’t support gradient checkpointing (while training this model)
)

model.config.use_cache = False
model.config.pretraining_tp = 1

model = prepare_model_for_kbit_training(
    model, 
    use_gradient_checkpointing=True
)

학습하고자 하는 Arguments 인수를 지정합니다. 훈련 매개변수에는 결과 디렉터리 설정, 훈련 에포크 수, 배치 크기, 최적화 전략, 학습 속도 등이 포함됩니다. 이러한 매개변수는 미세 조정 중에 Phi-2가 특정 작업에 적응하는 방식에 영향을 미칩니다.

# Set training arguments
training_arguments = TrainingArguments(
    output_dir = "./results",
    num_train_epochs = 1,
    fp16 = False,
    bf16 = False,
    per_device_train_batch_size = 4,
    per_device_eval_batch_size = 4,
    gradient_accumulation_steps = 1,
    gradient_checkpointing = True,
    max_grad_norm = 0.3,
    learning_rate = 2e-4,
    weight_decay = 0.001,
    optim = "paged_adamw_32bit",
    lr_scheduler_type = "cosine",
    max_steps = -1,
    warmup_ratio = 0.03,
    group_by_length = True,
    save_steps = 0,
    logging_steps = 25,
)

LoRA에 대한 configuration을 설정합니다. peft 매개변수를 사용하여 LoRA 구성을 설정합니다. LoRA 구성에는 순위, 알파, 바이어스, 작업 유형 및 대상 모듈과 같은 매개 변수를 지정하는 것이 포함됩니다. 이러한 매개변수는 미세 조정 중에 모델이 적응하는 방식을 결정합니다.

# LoRA configuration
peft_config = LoraConfig(
    r=64,                   #default=8
    lora_alpha= 16,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules= ["Wqkv", "out_proj"] #["Wqkv", "fc1", "fc2" ] # ["Wqkv", "out_proj", "fc1", "fc2" ]
)
#print_trainable_parameters(model)

마지막으로 SFTP를 초기화하고 레이블이 지정된 데이터 세트에서 모델을 훈련합니다. 훈련이 끝나면 새로 미세 조정된 모델을 저장합니다.

# Set supervised fine-tuning parameters
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length= None,
    tokenizer=tokenizer,
    args=training_arguments,
)

Downloading shards: 100%|██████████| 2/2 [04:09<00:00, 124.97s/it]
Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00,  1.02it/s]
You are using an old version of the checkpointing format that is deprecated (We will also silently ignore `gradient_checkpointing_kwargs` in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method `_set_gradient_checkpointing` in your model.
You are using an old version of the checkpointing format that is deprecated (We will also silently ignore `gradient_checkpointing_kwargs` in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method `_set_gradient_checkpointing` in your model.
/home/kubwa/anaconda3/envs/llm/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:246: UserWarning: You didn't pass a `max_seq_length` argument to the SFTTrainer, this will default to 1024
  warnings.warn(
Map: 100%|██████████| 1000/1000 [00:00<00:00, 1627.56 examples/s]

Training

이제 학습을 진행해 보겠습니다.

# Train model
trainer.train()

Step	Training Loss
25	1.400900
50	1.337200
75	1.149200
100	1.115800
125	1.171100
150	1.121900
175	1.124300
200	1.107600
225	1.131800
250	1.077100

TrainOutput(global_step=250, training_loss=1.1736907958984375, metrics={'train_runtime': 264.0064, 'train_samples_per_second': 3.788, 'train_steps_per_second': 0.947, 'total_flos': 6328756445552640.0, 'train_loss': 1.1736907958984375, 'epoch': 1.0})

학습이 완료됐습니다. 그런 다음 새로운 fine-tuning 모델을 저장합니다.

# Save trained model
trainer.model.save_pretrained(new_model)

TensorBoard로 시각화를 해보겠습니다.

#Check training results with tensorboard
%load_ext tensorboard
%tensorboard --logdir results/runs

GPU Memory에 남아있는 기존의 model, pipe, trainer를 삭제합니다. 메모리가 넉넉하다면 삭제할 필요는 업습니다. 모자란 GPU 메모리 관리차원에서 ^^

# Clear the memory
del model, pipe, trainer

Fine-tuned Model Test

텍스트 생성 파이프라인을 따라 모델을 테스트합니다.

# Run text generation pipeline with our model
logging.set_verbosity(logging.CRITICAL)

prompt = "위장관 종양의 치료법은 무엇인가요?"
instruction = f"### Instruction: {prompt} "

pipe = pipeline(
    task="text-generation", 
    model=model, 
    tokenizer=tokenizer, 
    max_length=200
)
result = pipe(instruction)
print(result[0]['generated_text'][len(instruction):])

위장관 종양에는 다음과 같은 여러 가지 치료법이 있습니다:

1. 수술: 위장관 카르시노이드 종양의 일차 치료는 종양을 제거하는 수술입니다.

2. 화학 요법: 화학 요법은 암세포를 죽이기 위해 약물을 사용하는 치료법입니다. 위장관 카르시노이드 종양 치료를 위해 수술과 함께 사용되는 경우가 많습니다.

3. 방사선 요법: 방사선 요법은 고에너지 방사선을 사용하여 암세포를 죽이는 치료법입니다. 위장관 카르시노이드 종양을 치료하기 위해 수술 및 화학 요법과 함께 자주 사용됩니다.

4. 표적 치료: 표적 치료는 암세포의 성장과 확산에 관여하는 특정 분자 또는 경로를 표적으로 하는 치료법입니다. 종종 수술, 화학 요법 및 방사선 요법과 함께 다음을 위해 사용됩니다.

Reload & Push to Hugginface Hub

LLM에 질문을 던져 더 많은 것을 시도하고 테스트할 수도 있고, 파라미터를 변경하여 성능을 확인할 수도 있습니다. Huggingface Hub에 푸시하여 모델을 저장을 해보겠습니다.

# Reload model and merge it with LoRA parameters
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    torch_dtype=torch.float16,
    trust_remote_code=True,
    cache_dir="",
    device_map={"": 0},
)
model = PeftModel.from_pretrained(model, new_model)
model = model.merge_and_unload()

# Reload tokenizer to save it
tokenizer = AutoTokenizer.from_pretrained(
    base_model, 
    trust_remote_code=True
)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

!huggingface-cli login

model.push_to_hub(new_model, use_temp_dir=False)
tokenizer.push_to_hub(new_model, use_temp_dir=False)

PreviousLLM Quantization NextConvert GGUF gemma-2b with llama.cpp

Last updated 11 days ago

1. What is Phi-2?

Phi-2 compared to other language models

2. Accessing the Phi-2 Model

Library Installation

3. Basic Operations

Q&A

Code

Chat

4. Fine-Tuning Phi-2

Setting up

Login to Hugging Face CLI

Loading the Dataset

Loading Model and Tokenizer

Adding Adopter Layer

Training the Model

Saving the Model

Model Evaluation

Fine-tuning 예제 실습: QLoRA, PEFT, SFTT

Pre-training

Fine-tuning

LLM Fine-tune with Phi-2

Setup Environments

Model, Dataset, Tokenizer

QLoRa Configuration

Training

Fine-tuned Model Test

Reload & Push to Hugginface Hub