4️⃣Convert GGUF gemma-2b with llama.cpp

Quantizing LLM GGUF With llama.cpp

대부분의 언어 모델은 너무 커서 소비자 하드웨어에서 미세 조정할 수 없습니다. 예를 들어 650억 개의 파라미터 모델을 미세 조정하려면 780GB 이상의 GPU 메모리가 필요합니다. 이는 A100 80GB GPU 10대에 해당하는 용량입니다.

이제 LoRA 및 QLoRA와 같은 효율적인 파파미터 기술을 통해 소비자 하드웨어에서 모델을 보다 쉽게 미세 조정할 수 있게 되었습니다.

LoRA는 소량의 훈련 가능한 파라미터, 즉 LLM의 각 레이어에 대한 어댑터를 추가하고 모든 원래 파라미터를 동결합니다.

미세 조정을 위해 어댑터 무게만 업데이트하면 되므로 메모리 사용량을 크게 줄일 수 있습니다.

QLoRA는 4비트 양자화, 이중 양자화, 페이징을 위한 NVIDIA 통합 메모리 활용을 도입하여 세 단계 더 나아갔습니다.

  • 4-bit NormalFloat Quantization: 각 양자화 빈에서 동일한 수의 값을 보장하여 이상값에 대한 계산 문제와 오류를 방지합니다.

  • Double quantization: 추가 메모리 절약을 위해 양자화 상수를 양자화하는 프로세스입니다.

  • Paging with unified memory: NVIDIA 통합 메모리 기능을 사용하며 CPU와 GPU 간의 페이지 간 전송을 자동으로 처리합니다.

Basic steps Involved in fine-tuning:

  1. 기본 모델을 로드

  2. 기본 모델을 학습

  3. LoRA 어댑터를 저장

  4. 기본 모델을 절반/최대 정밀도(half/full precision)로 다시 로드

  5. LoRA 가중치를 기본 모델과 병합

  6. 병합된 모델을 저장하고 허깅 페이스 허브로 푸시

1. gemma-2B Fine-tuning

Setup Environments

%pip3 install -q -U bitsandbytes
%pip3 install -q -U peft
%pip3 install -q -U trl
%pip3 install -q -U accelerate
%pip3 install -q -U datasets
import os

os.environ["HF_TOKEN"] = 'Your_Huggingface_Key'

Import dependencies

google/gemma 모델을 사용하려며 huggingface google 페이지에서 Acknowledge License를 클릭하여 사용을 신청하고 승인 후 활용 가능합니다. 신청후 승인은 5분이내에 이뤄집니다.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

#set the qunatization config
bnb_config = BitsAndBytesConfig(
#Load the model and Tokenizer
model_id = "google/gemma-2b"
model = AutoModelForCausalLM.from_pretrained(
tokenizer = AutoTokenizer.from_pretrained(

Load Dataset

의료 진료데이터인 medical-reasoning 데이터셋으로 fine-tuning을 해보겠습니다. https://huggingface.co/datasets/mamachang/medical-reasoning

from datasets import load_dataset
dataset = load_dataset("mamachang/medical-reasoning")
    train: Dataset({
        features: ['input', 'instruction', 'output'],
        num_rows: 3702

trainset에 input, instruction, output 컬럼이 있는 것을 확인할 수 있습니다. 이를 데이터프레임으로 변환해서 확인해 보겠습니다.

df = dataset["train"].to_pandas()

Generate prompt for training

def generate_prompt(data_point):
    """Gen. input text based on a prompt, task instruction, (context info.), and answer

    :param data_point: dict: Data point
    :return: dict: tokenzed prompt

    # Generate prompt
    prefix_text = 'Below is an instruction that describes a task. Write a response that ' \
               'appropriately completes the request.\n\n'
    # Samples with additional context into.
    if data_point['input']:
        text = f"""<start_of_turn>user {prefix_text} {data_point["instruction"]} here are the inputs {data_point["input"]} <end_of_turn>\n<start_of_turn>model{data_point["output"]} <end_of_turn>"""
    # Without
        text = f"""<start_of_turn>user {prefix_text} {data_point["instruction"]} <end_of_turn>\n<start_of_turn>model{data_point["output"]} <end_of_turn>"""
    return text

# add the "prompt" column in the dataset
text_column = [generate_prompt(data_point) for data_point in dataset["train"]]
dataset = dataset["train"].add_column("prompt", text_column)
    features: ['input', 'instruction', 'output', 'prompt'],
    num_rows: 3702

Train/Test Split

dataset = dataset.shuffle(seed=1234)  # Shuffle dataset here
dataset = dataset.map(
    lambda samples: tokenizer(samples["prompt"]), 
dataset = dataset.train_test_split(test_size=0.1)
train_data = dataset["train"]
test_data = dataset["test"]
    features: ['input', 'instruction', 'output', 'prompt', 'input_ids', 'attention_mask'],
    num_rows: 3331
    features: ['input', 'instruction', 'output', 'prompt', 'input_ids', 'attention_mask'],
    num_rows: 371


PeftModel을 로드하고 PEFT의 get_peft_model 유틸리티 함수와 prepare_model_for_kbit_training 메서드를 사용하여 LoRA를 사용하도록 지정합니다.

import bitsandbytes as bnb

def find_all_linear_names(model):
  cls = bnb.nn.Linear4bit #if args.bits == 4 else (bnb.nn.Linear8bitLt if args.bits == 8 else torch.nn.Linear)
  lora_module_names = set()
  for name, module in model.named_modules():
    if isinstance(module, cls):
      names = name.split('.')
      lora_module_names.add(names[0] if len(names) == 1 else names[-1])
    if 'lm_head' in lora_module_names: # needed for 16-bit
  return list(lora_module_names)

modules = find_all_linear_names(model)
['k_proj', 'gate_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj', 'v_proj']
from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training, get_peft_model

model = prepare_model_for_kbit_training(model)

lora_config = LoraConfig(

model = get_peft_model(model, lora_config)
  (model): GemmaModel(
    (embed_tokens): Embedding(256000, 2048, padding_idx=0)
    (layers): ModuleList(
      (0-17): 18 x GemmaDecoderLayer(
        (self_attn): GemmaSdpaAttention(
          (q_proj): Linear4bit(in_features=2048, out_features=2048, bias=False)
          (k_proj): Linear4bit(in_features=2048, out_features=256, bias=False)
          (v_proj): Linear4bit(in_features=2048, out_features=256, bias=False)
          (o_proj): Linear4bit(in_features=2048, out_features=2048, bias=False)
          (rotary_emb): GemmaRotaryEmbedding()
        (mlp): GemmaMLP(
          (gate_proj): Linear4bit(in_features=2048, out_features=16384, bias=False)
          (up_proj): Linear4bit(in_features=2048, out_features=16384, bias=False)
          (down_proj): Linear4bit(in_features=16384, out_features=2048, bias=False)
          (act_fn): PytorchGELUTanh()
        (input_layernorm): GemmaRMSNorm()
        (post_attention_layernorm): GemmaRMSNorm()
    (norm): GemmaRMSNorm()
  (lm_head): Linear(in_features=2048, out_features=256000, bias=False)
trainable, total = model.get_nb_trainable_parameters()

print(f"Trainable: {trainable} | total: {total} | Percentage: {trainable/total*100:.4f}%")
Trainable: 78446592 | total: 2584619008 | Percentage: 3.0351%


이제 학습을 시켜 보겠습니다.

import transformers

from trl import SFTTrainer

tokenizer.pad_token = tokenizer.eos_token

trainer = SFTTrainer(
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
model.config.use_cache = False  # silence the warnings. Please re-enable for inference!

Step Training Loss
1 2.132100
2 2.036100
3 1.966600
4 1.794400
5 1.723900
6 1.644200
7 1.579900
8 1.402000
9 1.336500
10 1.288700
11 1.134200
12 1.228800
13 1.130100
14 1.171200
15 1.154600
16 1.165900
17 1.166300
18 1.093700
19 1.128000
20 1.083000
21 1.115100
22 1.134300
23 1.133400
24 1.085800
25 1.086600
26 1.090700
27 1.101500
28 1.024200
29 1.115900
30 1.055200
31 1.031000
32 1.038400
33 1.071800
34 1.060800
35 1.073500
36 1.013900
37 1.053400
38 1.062800
39 1.060000
40 1.067900
41 1.004100
42 1.036200
43 1.118600
44 1.054600
45 1.040600
46 0.987600
47 1.075600
48 1.050100
49 1.108100
50 1.057900
51 1.043800
52 1.109800
53 1.109200
54 1.032400
55 1.013100
56 1.010800
57 1.056000
58 1.075000
59 1.019000
60 1.042600
61 1.012100
62 1.053700
63 1.022000
64 1.063300
65 1.044900
66 1.021100
67 0.994300
68 1.004900
69 1.041000
70 1.087700
71 1.071200
72 1.010600
73 0.990200
74 1.061600
75 1.001700
76 1.030700
77 0.983900
78 1.056900
79 1.015400
80 1.035800
81 0.983800
82 0.996300
83 1.069300
84 1.058400
85 1.031700
86 1.039900
87 1.086900
88 1.067800
89 1.021400
90 1.022100
91 0.983400
92 1.072000
93 1.030100
94 1.041800
95 0.944500
96 1.009800
97 1.016500
98 1.043500
99 1.043800
100 1.011100

Test Fine-tuned model

def get_completion(query: str, model, tokenizer) -> str:
  device = "cuda:0"

  prompt_template = """
  아래는 작업을 설명하는 명령어입니다. 요청을 적절히 완료하는 응답을 작성하세요.

  prompt = prompt_template.format(query=query)

  encodeds = tokenizer(

  model_inputs = encodeds.to(device)

  generated_ids = model.generate(
  # decoded = tokenizer.batch_decode(generated_ids)
  decoded = tokenizer.decode(
  return (decoded)

query = """\n\n 괄호 안의 옵션 중 하나를 선택하여 답하세요. 그 사이에 추론을 작성하세요.<analysis></analysis>. 중간에 답안 작성 <answer></answer>. 다음은 입력 내용입니다. Q: 8세 남아가 메스꺼움, 구토, 배뇨 횟수 감소 증상으로 어머니가 소아과 의사에게 데려왔습니다. 급성 림프모구 백혈병으로 5일 전에 1차 화학 요법을 받았습니다. 화학 요법을 시작하기 전 그의 백혈구 수는 60,000/mm3였습니다. 바이탈 사인은 맥박 110/분, 체온 37.0°C(98.6°F), 혈압 100/70mmHg입니다. 신체 검사 결과 양측 발바닥 부종이 있습니다. 다음 중 이 질환의 진단을 확인하는 데 도움이 되는 혈청 검사 및 소변 검사 결과는? ? \'A': '고칼륨혈증, 고인산혈증, 저칼슘혈증, 크레아틴키나아제(MM)가 매우 높음', 'B': '고칼륨혈증, 고인산혈증, 저칼슘혈증, 고요산혈증, 소변 상청색, 헴 양성', 'C': '소변 내 요산 결정, 고칼륨혈증, 고인산혈증, 유산증, 요산염 결정', 'D': '고요산혈증, 고칼륨혈증, 고인산혈증, 요산결정이 있음' 정답은? '고요산혈증, 고칼륨혈증, 고인산혈증 및 요로 단클론 스파이크', 'E': '고요산혈증, 고칼륨혈증, 고인산혈증, 젖산증 및 옥살산염 결정'.'}"""

result = get_completion(
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

  아래는 작업을 설명하는 명령어입니다. 요청을 적절히 완료하는 응답을 작성하세요.

 괄호 안의 옵션 중 하나를 선택하여 답하세요. 그 사이에 추론을 작성하세요.<analysis></analysis>. 중간에 답안 작성 <answer></answer>. 다음은 입력 내용입니다. Q: 8세 남아가 메스꺼움, 구토, 배뇨 횟수 감소 증상으로 어머니가 소아과 의사에게 데려왔습니다. 급성 림프모구 백혈병으로 5일 전에 1차 화학 요법을 받았습니다. 화학 요법을 시작하기 전 그의 백혈구 수는 60,000/mm3였습니다. 바이탈 사인은 맥박 110/분, 체온 37.0°C(98.6°F), 혈압 100/70mmHg입니다. 신체 검사 결과 양측 발바닥 부종이 있습니다. 다음 중 이 질환의 진단을 확인하는 데 도움이 되는 혈청 검사 및 소변 검사 결과는? ? 'A': '고칼륨혈증, 고인산혈증, 저칼슘혈증, 크레아틴키나아제(MM)가 매우 높음', 'B': '고칼륨혈증, 고인산혈증, 저칼슘혈증, 고요산혈증, 소변 상청색, 헴 양성', 'C': '소변 내 요산 결정, 고칼륨혈증, 고인산혈증, 유산증, 요산염 결정', 'D': '고요산혈증, 고칼륨혈증, 고인산혈증, 요산결정이 있음' 정답은? '고요산혈증, 고칼륨혈증, 고인산혈증 및 요로 단클론 스파이크', 'E': '고요산혈증, 고칼륨혈증, 고인산혈증, 젖산증 및 옥살산염 결정'.'}

E: '고요산혈증, 고칼륨혈증, 고인산혈증, 젖산증 및 옥살산염 결정'.'</answer>
This is a question about a patient with clinical presentation consistent with sepsis, which would be characterized by: 

The question stem provides information about 8-year-old boy who presents with acute onset fever, vomiting, and urinary frequency. In addition, his blood pressure is now 100/70 mmHg, he has bilateral pedal edema, and initial white blood cell count was 60,000/mm^3. This clinical presentation makes him most likely to have sepsis, infection in the body. Here are 5 answer choice options - choices A, B, and C describe findings consistent with septic illness. Choice D indicates hypoglycemia. Choice E describes the clinical findings of metabolic acidosis, hemolysis, coagulopathy, elevated lactate, and uric acid. The correct answer choice includes acute leukocytosis (up to 25,000/mm^3) and metabolic acidosis.
E: '고요산혈증, 고칼륨혈증, 고인산혈증, 젖산증 및 옥살산염 결정'.</answer>
This is a question about a patient with clinical presentation consistent with sepsis, which would be characterized by: 

The question stem provides information about 8-year-old boy who presents with acute onset fever, vomiting, and urinary frequency. In addition, his blood pressure is now 100/70 mmHg, he has bilateral pedal edema, and initial white blood cell count was 60,000/mm^3. This clinical presentation makes him most likely to have sepsis, infection in the body. Here are 5 answer choice options - choices A, B, and C describe findings consistent with septic illness. Choice D indicates hypoglycemia. Choice E describes the clinical findings of metabolic acidosis, hemolysis, coagulopathy, elevated lactate, and uric acid. The correct answer choice includes acute leukocytosis (up to 25,000/mm^3) and metabolic acidosis.
E: '고요산혈증, 고칼륨혈증, 고인산혈증, 젖산증 및 옥살산염 결정'.</answer>
This is a question about a patient with clinical presentation consistent with sepsis, which would be characterized by: 

The question stem provides information about 8-year-old boy who presents with acute onset fever, vomiting, and urinary frequency. In addition, his blood pressure is now 100/70 mmHg, he has bilateral pedal edema, and initial white blood cell count was 60,000/mm^3. This clinical presentation makes him most likely to have sepsis, infection in the body. Here are 5 answer choice options - choices A, B, and C describe findings consistent with septic illness. Choice D indicates hypoglycemia. Choice E describes the clinical findings of metabolic acidosis, hemolysis, coagulopathy, elevated lactate, and uric acid. The correct answer choice includes acute leukocytosis (up to 25,000/mm^3) and metabolic acidosis.
E: '고요산혈증, 고칼륨혈증, 고인산혈증, 젖산증 및 옥살산염 결정'.</answer>
This is a question about a patient with clinical
print(f"Model Answer : \n {result.split('model')[-1]}")
Model Answer : 
E: '고요산혈증, 고칼륨혈증, 고인산혈증, 젖산증 및 옥살산염 결정'.</answer>
This is a question about a patient with clinical
query = """Please answer with one of the option in the bracket. Write reasoning in between <analysis></analysis>. Write answer in between <answer></answer>.here are the inputs:Q:A 34-year-old man presents to a clinic with complaints of abdominal discomfort and blood in the urine for 2 days. He has had similar abdominal discomfort during the past 5 years, although he does not remember passing blood in the urine. He has had hypertension for the past 2 years, for which he has been prescribed medication. There is no history of weight loss, skin rashes, joint pain, vomiting, change in bowel habits, and smoking. On physical examination, there are ballotable flank masses bilaterally. The bowel sounds are normal. Renal function tests are as follows:\nUrea 50 mg/dL\nCreatinine 1.4 mg/dL\nProtein Negative\nRBC Numerous\nThe patient underwent ultrasonography of the abdomen, which revealed enlarged kidneys and multiple anechoic cysts with well-defined walls. A CT scan confirmed the presence of multiple cysts in the kidneys. What is the most likely diagnosis?? \n{'A': 'Autosomal dominant polycystic kidney disease (ADPKD)', 'B': 'Autosomal recessive polycystic kidney disease (ARPKD)', 'C': 'Medullary cystic disease', 'D': 'Simple renal cysts', 'E': 'Acquired cystic kidney disease'}"""
result = get_completion(

print(f"Model Answer : \n {result.split('model')[-1]}")
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.

Model Answer : 
This question describes a 34-year-old man with abdominal discomfort and blood in the urine, along with long standing hypertension. The examination shows renal masses and chronic renal failure on labs. The key findings are abdominal tenderness over the flank, normal bowel sounds, negative urine protein, normal WBCs and RBCs, enlarged kidneys with cysts and CT showing cysts. The cysts are homogenous and well-defined. This pattern is characteristic for ADPKD which is caused by mutations in the PKHD1 gene.
A: Autosomal dominant polycystic kidney disease (ADPKD)

and the expected answer is D: Simple renal cysts

this is an objective structured clinical examination (OSCE) that is testing the student on medical knowledge. it presents a vignette describing a clinical scenario involving a 34-year-old man with abdominal discomfort, blood in urine, hypertension, enlarged kidneys, cysts on imaging, and other lab markers consistent with renal failure. the task is to determine the most likely diagnosis based on the key findings. here, the answer is ADPKD because of the cysts confirmed on imaging. simple renal cysts would be expected if the only findings were abnormal renal function. ARPKD would be more likely to present with focal defects rather than a diffuse pattern of cysts. 

please feel free to repost and improve with edits!

This is not an objective structured clinical examination (OSCE) and does not test student knowledge of differential diagnoses. It simply tests their ability to analyze a clinical vignette and answer questions based on key findings. 

Based on the description in the vignette, the key findings are:

* 34 year old man with abdominal discomfort, blood in urine, hypertension, enlarged kidneys, cysts seen on imaging, and other lab abnormalities consistent with chronic kidney disease
* Chronic kidney disease with cysts 
* No focal defects
* No focal renal diseases like ADPKD would present with focal defects on imaging
* ADPKD would confirm the diagnosis because cysts on imaging with associated genetic mutation
* Other answer choices like ARPKD that do not fit with chronic kidney disease findings are incorrect

This vignette is describing a 34 year-old man with abdominal pain, blood in urine, hypertension, enlarged kidneys, cysts, and other lab studies consistent with chronic kidney disease or end stage renal failure. Based on the key findings, ADPKD is the correct answer because cysts confirm a diagnosis of ADPKD based on a family history of multiple cysts and elevated urinary alpha-1. 

Other diagnoses like ARPKD do not fit with the chronic renal failure findings.

Please feel free to edit and add to better explain the reasoning. 

Thank you!


from peft import LoraConfig,PeftModel,AutoPeftModelForCausalLM
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

#set the LoRA configurations
peft_config =LoraConfig(

#peft_model_id = "Plaban81/gemma-medical_qa-Finetune"
peft_model_id = "nowave/gemma-2b-loudai"

config = peft_config.from_pretrained(peft_model_id)

model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path,
ptokenizer= AutoTokenizer.from_pretrained(peft_model_id)
def get_completion(query: str, model, tokenizer) -> str:
  device = "cuda:0"

  prompt_template = """
  Below is an instruction that describes a task. Write a response that appropriately completes the request.

  prompt = prompt_template.format(query=query)

  encodeds = tokenizer(prompt, return_tensors="pt", add_special_tokens=True)

  model_inputs = encodeds.to(device)

  generated_ids = model.generate(**model_inputs, max_new_tokens=1000, do_sample=True, pad_token_id=tokenizer.eos_token_id)
  # decoded = tokenizer.batch_decode(generated_ids)
  decoded = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
  return (decoded)
query = """Please answer with one of the option in the bracket. Write reasoning in between <analysis></analysis>. Write answer in between <answer></answer>.here are the inputs:Q:A 34-year-old man presents to a clinic with complaints of abdominal discomfort and blood in the urine for 2 days. He has had similar abdominal discomfort during the past 5 years, although he does not remember passing blood in the urine. He has had hypertension for the past 2 years, for which he has been prescribed medication. There is no history of weight loss, skin rashes, joint pain, vomiting, change in bowel habits, and smoking. On physical examination, there are ballotable flank masses bilaterally. The bowel sounds are normal. Renal function tests are as follows:\nUrea 50 mg/dL\nCreatinine 1.4 mg/dL\nProtein Negative\nRBC Numerous\nThe patient underwent ultrasonography of the abdomen, which revealed enlarged kidneys and multiple anechoic cysts with well-defined walls. A CT scan confirmed the presence of multiple cysts in the kidneys. What is the most likely diagnosis?? \n{'A': 'Autosomal dominant polycystic kidney disease (ADPKD)', 'B': 'Autosomal recessive polycystic kidney disease (ARPKD)', 'C': 'Medullary cystic disease', 'D': 'Simple renal cysts', 'E': 'Acquired cystic kidney disease'}"""
result = get_completion(query=query, model=model, tokenizer=ptokenizer)

print(f"Model Answer : \n {result.split('model')[-1]}")
> user  
Below is an instruction that describes a task. Write a response that appropriately completes the request.  
Please answer with one of the option in the bracket. Write reasoning in between <analysis></analysis>. Write answer in between <answer></answer>.here are the inputs:Q:A 34-year-old man presents to a clinic with complaints of abdominal discomfort and blood in the urine for 2 days. He has had similar abdominal discomfort during the past 5 years, although he does not remember passing blood in the urine. He has had hypertension for the past 2 years, for which he has been prescribed medication. There is no history of weight loss, skin rashes, joint pain, vomiting, change in bowel habits, and smoking. On physical examination, there are ballotable flank masses bilaterally. The bowel sounds are normal. Renal function tests are as follows:  
Urea 50 mg/dL  
Creatinine 1.4 mg/dL  
Protein Negative  
RBC Numerous  
The patient underwent ultrasonography of the abdomen, which revealed enlarged kidneys and multiple anechoic cysts with well-defined walls. A CT scan confirmed the presence of multiple cysts in the kidneys. What is the most likely diagnosis??  
{'A': 'Autosomal dominant polycystic kidney disease (ADPKD)', 'B': 'Autosomal recessive polycystic kidney disease (ARPKD)', 'C': 'Medullary cystic disease', 'D': 'Simple renal cysts', 'E': 'Acquired cystic kidney disease'}  
> model  
<Answer:A> The most likely diagnosis is **'Autosomal dominant polycystic kidney disease (ADPKD)'.**  
In ADPKD, an abnormal gene mutation is responsible for the excessive growth of fluid-filled cysts in the kidneys. These cysts can be detected through various imaging techniques, including ultrasound, CT scan, and MRI. The presence of multiple renal cysts and enlarged kidneys is characteristic of ADPKD.

이제 llama.cpp를 사용하여 4-bit GGUF 모델로 변환한 후 Hugginface Hub에 push를 하겠습니다.

2. Convert to GGUF format with llama.cpp

Setup Environments

import locale

def getpreferredencoding(do_setlocale = True):
  return "UTF-8"
locale.getpreferredencoding = getpreferredencoding
!git clone https://github.com/ggerganov/llama.cpp
!mkdir ./quantized_model/

Model Download

from huggingface_hub import snapshot_download

# your huggingface hub model name
model_name = "nowave/gemma-2b-loudai"
methods = ['q4_k_m']

# original model path
base_model = "./original_model/"

# model save path
quantized_path = "./quantized_model/"

snapshot_download(repo_id=model_name, local_dir=base_model , local_dir_use_symlinks=False)
original_model = quantized_path+'/FP16.gguf'
Convert gguf

%pip install sentencepiece
!python llama.cpp/convert-hf-to-gguf.py ./original_model/ --outtype f16 --outfile ./quantized_model/FP16.gguf
Quantize 4-bit format.

import os
for m in methods:
  qtype = f"{quantized_path}/{m.upper()}.gguf"
  os.system("./llama.cpp/quantize "+quantized_path+"/FP16.gguf "+qtype+" "+m)

! ./llama.cpp/main -m ./quantized_model/Q4_K_M.gguf -n 90 --repeat_penalty 1.0 --color -i -r "User:" -f llama.cpp/prompts/chat-with-bob.txt
Push Model to HF

from huggingface_hub import notebook_login

from huggingface_hub import HfApi, HfFolder, create_repo, upload_file

model_path = "./quantized_model/Q4_K_M.gguf" # Your model's local path
repo_name = "gemma-2b-loudai-GGUF"  # Desired HF Hub repository name
repo_url = create_repo(repo_name, private=False)
api = HfApi()

Download the quantized model for inference

!wget "https://huggingface.co/nowave/gemma-2b-loudai-GGUF/resolve/main/Q4_K_M.gguf"

Install llama.cpp on GPU

!CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python

GGUF model inference with Llama.cpp.

from llama_cpp import Llama

# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
llm = Llama(
  model_path="/content/Q4_K_M.gguf",  # Download the model file first
  n_ctx=32768,  # The max sequence length to use - note that longer sequence lengths require much more resources
  n_threads=1,            # The number of CPU threads to use, tailor to your system and the resulting performance
  n_gpu_layers=-1         # The number of layers to offload to GPU, if you have GPU acceleration available
query = """Please answer with one of the option in the bracket. Write reasoning in between <analysis></analysis>. Write answer in between <answer></answer>. here are the inputs Q:An 8-year-old boy is brought to the pediatrician by his mother with nausea, vomiting, and decreased frequency of urination. He has acute lymphoblastic leukemia for which he received the 1st dose of chemotherapy 5 days ago. His leukocyte count was 60,000/mm3 before starting chemotherapy. The vital signs include: pulse 110/min, temperature 37.0°C (98.6°F), and blood pressure 100/70 mm Hg. The physical examination shows bilateral pedal edema. Which of the following serum studies and urinalysis findings will be helpful in confirming the diagnosis of this condition? ? \n{'A': 'Hyperkalemia, hyperphosphatemia, hypocalcemia, and extremely elevated creatine kinase (MM)', 'B': 'Hyperkalemia, hyperphosphatemia, hypocalcemia, hyperuricemia, urine supernatant pink, and positive for heme', 'C': 'Hyperuricemia, hyperkalemia, hyperphosphatemia, lactic acidosis, and urate crystals in the urine', 'D': 'Hyperuricemia, hyperkalemia, hyperphosphatemia, and urinary monoclonal spike', 'E': 'Hyperuricemia, hyperkalemia, hyperphosphatemia, lactic acidosis, and oxalate crystals'}"""
output = llm(
  max_tokens=512,  # Generate up to 512 tokens
query = """\n\n Please answer with one of the option in the bracket. Write reasoning in between <analysis></analysis>. Write answer in between <answer></answer>. here are the inputs Q:An 8-year-old boy is brought to the pediatrician by his mother with nausea, vomiting, and decreased frequency of urination. He has acute lymphoblastic leukemia for which he received the 1st dose of chemotherapy 5 days ago. His leukocyte count was 60,000/mm3 before starting chemotherapy. The vital signs include: pulse 110/min, temperature 37.0°C (98.6°F), and blood pressure 100/70 mm Hg. The physical examination shows bilateral pedal edema. Which of the following serum studies and urinalysis findings will be helpful in confirming the diagnosis of this condition? ? \n{'A': 'Hyperkalemia, hyperphosphatemia, hypocalcemia, and extremely elevated creatine kinase (MM)', 'B': 'Hyperkalemia, hyperphosphatemia, hypocalcemia, hyperuricemia, urine supernatant pink, and positive for heme', 'C': 'Hyperuricemia, hyperkalemia, hyperphosphatemia, lactic acidosis, and urate crystals in the urine', 'D': 'Hyperuricemia, hyperkalemia, hyperphosphatemia, and urinary monoclonal spike', 'E': 'Hyperuricemia, hyperkalemia, hyperphosphatemia, lactic acidosis, and oxalate crystals'}"""
output = llm(
  max_tokens=512,  # Generate up to 512 tokens


Extracting the answer


병력과 신체 검사 소견을 바탕으로 급성 림프모구 백혈병(ALL) 환자를 진단하는 문제입니다. 주요 소견은 다음과 같습니다:
- 8세 남아
- 급성 림프모구 백혈병 진단
- 5일 전 1차 화학 요법 투여
- 류코옥탄 수치 60,000/mm3
- 활력 징후에는 빈맥, 부종 및 고요산혈증이 포함됩니다.

감별 진단에는 다음이 포함됩니다:
- 고요산혈증 및 크레아티닌 키나아제(CK) 상승으로 인한 요 뇨증
- 고요산혈증 및 크레아틴키나아제(CK) 상승으로 인한 소변 내 요산 결정
- 고요산혈증 및 크레아티닌 키나아제(CK) 상승으로 인한 말산뇨

주요 검사는 다음과 같습니다:
- 혈청 검사에는 고칼륨혈증, 고인산혈증, 저칼슘혈증 및 CK 상승이 포함되어야 합니다.
- 소변 검사에는 헴에 대한 헴 검사 양성이 포함되어야 합니다.

이러한 검사를 바탕으로 가장 가능성이 높은 진단은 고요산혈증으로 인한 요산뇨증과 급성 림프모구 백혈병으로 인한 CK 상승입니다. 소변의 요산 결정이 진단을 확인합니다.

E: 고요산혈증, 고칼륨혈증, 고인산혈증, 젖산증 및 옥살산염 결정
</answer> <end_of_turn>
이 질문은 고요산혈증으로 인한 요산뇨증과 급성 림프모구 백혈병으로 인한 CK 상승의 진단을 확인하기 위한 추가 검사를 요청하고 있습니다. 요청된 검사는 고요산혈증과 CK 상승으로 인한 요산뇨를 가장 잘 확인할 수 있는 검사들입니다. 헴 검사 양성과 소변 옥살산염 결정이 진단을 확인합니다.
</analysis> <start_of_turn>
E: 고요산혈증, 고칼륨혈증, 고인산혈증, 젖산증 및 옥살산결정이 있습니다.
</answer> <end_of_turn>
이 질문은 고요산혈증으로 인한 요산뇨증과 급성 림프모구 백혈병으로 인한 CK 상승의 진단을 확인하기 위한 추가 검사를 요청하고 있습니다. 요청된 검사는 고요산혈증과 CK 상승으로 인한 요산뇨를 가장 잘 확인할 수 있는 검사들입니다. 양성 헴 검사와 소변 옥살산염 결정이 진단을 확인합니다.

Last updated