llama3_8b_finetuningダウンロード - llama3_8b_finetuningソースコードのダウンロード

llama3_8b_finetuning

AI ソースコード

1.0.0

ダウンロード

お読みください

これは、データの取り込み、指示と回答のペアの作成、微調整、結果の評価を含むエンドツーエンドのプロジェクトです。

最初のステップ - データスクレイピング

まず、次のように依存関係をインストールします。

pip install -r requirements.txt

微調整のためのデータを見つけるために、Llama 3 のリリース日以降に発行された LLM 論文について Arxiv がスクレイピングされました。

Selenium スクレイピングコードはllama3_8b_finetuning/arxiv_scraping/Arxiv_pdfs_download.pyにあります (このスクリプトを実行する前に Web ドライバーをダウンロードする必要があります)。

スクレイピングコードは、Arxiv の最初のページにある論文を取得し、 llama3_8b_finetuning/data/pdfsフォルダーにダウンロードします。

第 2 ステップ - 指示と回答のペアの作成

このステップのコードは、/llama3_8b_finetuning/creating_instruction_dataset.py にあります。

ダウンロードした論文のテキストコンテンツは、Langchain の PyPDFLoader を使用して解析されました。次に、テキストは Grok 経由で Llama 3 70B モデルに送信されました。 Grok が選ばれたのは、そのスピードと低コストのためです。 Llama 3 ユーザーライセンスでは、Llama LLM のトレーニング/微調整にのみ使用が許可されていることに注意してください。したがって、Llama 3 を使用して、他のモデル (オープンソースのモデルであっても) や非営利目的の命令と回答のペアを作成することはできません。

ペア作成のプロンプトは utils ファイルにあり、以下でも確認できます。

「」

 You are a highly intelligent and knowledgeable assistant tasked with generating triples of instruction, input, and output from academic papers related to Large Language Models (LLMs). Each triple should consist of:

Instruction: A clear and concise task description that can be performed by an LLM.
Input: A sample input that corresponds to the instruction.
Output: The expected result or answer when the LLM processes the input according to the instruction.
Below are some example triples:

Example 1:

Instruction: Summarize the following abstract.
Input: "In this paper, we present a new approach to training large language models by incorporating a multi-task learning framework. Our method improves the performance on a variety of downstream tasks."
Output: "A new multi-task learning framework improves the performance of large language models on various tasks."
Example 2:

Instruction: Provide a brief explanation of the benefits of using multi-task learning for large language models.
Input: "Multi-task learning allows a model to learn from multiple related tasks simultaneously, which can lead to better generalization and performance improvements across all tasks. This approach leverages shared representations and can reduce overfitting."
Output: "Multi-task learning helps large language models generalize better and improve performance by learning from multiple related tasks simultaneously."
Now, generate similar triples based on the provided text from academic papers related to LLMs:

Source Text
(Provide the text from the academic papers here)

Generated Triples
Triple 1:

Instruction:
Input:
Output:
Triple 2:

Instruction:
Input:
Output:
Triple 3:

Instruction:
Input:
Output:

「」

最後に、命令はllama3_8b_finetuning/data/arxiv_instruction_dataset.jsonに保存されます。

第三ステップ - 微調整

このステップのコードは/llama3_8b_finetuning/model_trainer.pyにあります。

まず、命令と回答のペアをロードし、それらをテストデータセットとトレーニングデータセットに分割します。
それらを正しい構造にフォーマットします。

 class DatasetHandler :
    def __init__ ( self , data_path ):
        self . data_path = data_path

    def load_and_split_dataset ( self ):
        dataset = load_dataset ( "json" , data_files = self . data_path )
        train_test_split = dataset [ 'train' ]. train_test_split ( test_size = 0.2 )
        dataset_dict = DatasetDict ({
            'train' : train_test_split [ 'train' ],
            'test' : train_test_split [ 'test' ]
        })
        return dataset_dict [ 'train' ], dataset_dict [ 'test' ]

    @ staticmethod
    def format_instruction ( sample ):
        return f"""
        Below is an instruction that describes a task, paired with an input that provides further context. 
        Write a response that appropriately completes the request.

        ### Instruction:
        { sample [ 'Instruction' ] }

        ### Input:
        { sample [ 'Input' ] }

        ### Response:
        { sample [ 'Output' ] }
        """

次に、Hugging Face からモデルとトークナイザーを読み込むクラスを定義します。

 class ModelManager :
    def __init__ ( self , model_id , use_flash_attention2 , hf_token ):
        self . model_id = model_id
        self . use_flash_attention2 = use_flash_attention2
        self . hf_token = hf_token
        self . bnb_config = BitsAndBytesConfig (
            load_in_4bit = True ,
            bnb_4bit_use_double_quant = True ,
            bnb_4bit_quant_type = "nf4" ,
            bnb_4bit_compute_dtype = torch . bfloat16 if use_flash_attention2 else torch . float16
        )
    
    def load_model_and_tokenizer ( self ):
        model = AutoModelForCausalLM . from_pretrained (
            self . model_id , 
            quantization_config = self . bnb_config , 
            use_cache = False , 
            device_map = "auto" ,
            token = self . hf_token ,  
            attn_implementation = "flash_attention_2" if self . use_flash_attention2 else "sdpa"
        )
        model . config . pretraining_tp = 1

        tokenizer = AutoTokenizer . from_pretrained (
            self . model_id ,
            token = self . hf_token
        )
        tokenizer . pad_token = tokenizer . eos_token
        tokenizer . padding_side = "right"
        
        return model , tokenizer

Trainerクラスとトレーニング構成を定義します。

 class Trainer :
    def __init__ ( self , model , tokenizer , train_dataset , peft_config , use_flash_attention2 , output_dir ):
        self . model = model
        self . tokenizer = tokenizer
        self . train_dataset = train_dataset
        self . peft_config = peft_config
        self . args = TrainingArguments (
            output_dir = output_dir ,
            num_train_epochs = 3 ,
            per_device_train_batch_size = 4 ,
            gradient_accumulation_steps = 4 ,
            gradient_checkpointing = True ,
            optim = "paged_adamw_8bit" ,
            logging_steps = 10 ,
            save_strategy = "epoch" ,
            learning_rate = 2e-4 ,
            bf16 = use_flash_attention2 ,
            fp16 = not use_flash_attention2 ,
            tf32 = use_flash_attention2 ,
            max_grad_norm = 0.3 ,
            warmup_steps = 5 ,
            lr_scheduler_type = "linear" ,
            disable_tqdm = False ,
            report_to = "none"
        )
        self . model = get_peft_model ( self . model , self . peft_config )

    def train_model ( self , format_instruction_func ):
        trainer = SFTTrainer (
            model = self . model ,
            train_dataset = self . train_dataset ,
            peft_config = self . peft_config ,
            max_seq_length = 2048 ,
            tokenizer = self . tokenizer ,
            packing = True ,
            formatting_func = format_instruction_func , 
            args = self . args ,
        )
        trainer . train ()
        return trainer

最後に、クラスがインスタンス化され、トレーニングが開始されます。

Llama モデルはゲートされていることに注意してください。つまり、Hugging Face では、使用条件に同意し、Meta がアクセスを承認した後に提供されるトークンが必要になります (これはほぼ瞬時に行われます)。

 dataset_handler = DatasetHandler ( data_path = utils . Variables . INSTRUCTION_DATASET_JSON_PATH )
train_dataset , test_dataset = dataset_handler . load_and_split_dataset ()

new_test_dataset = []
for dict_ in test_dataset :
    dict_ [ 'Output' ] = ''
    new_test_dataset . append ( dict_ )

model_manager = ModelManager (
    model_id = "meta-llama/Meta-Llama-3-8B" ,
    use_flash_attention2 = True ,
    hf_token = os . environ [ "HF_TOKEN" ]
)
model , tokenizer = model_manager . load_model_and_tokenizer ()
model_manager . save_model_and_tokenizer ( model , tokenizer , save_directory = utils . Variables . BASE_MODEL_PATH )
model = model_manager . prepare_for_training ( model )

peft_config = LoraConfig (
    lora_alpha = 16 ,
    lora_dropout = 0.1 ,
    r = 64 ,
    bias = "none" ,
    task_type = "CAUSAL_LM" ,
    target_modules = [
        "q_proj" , "k_proj" , "v_proj" , "o_proj" , "gate_proj" , "up_proj" , "down_proj" ,
    ]
)

trainer = Trainer (
    model = model ,
    tokenizer = tokenizer ,
    train_dataset = train_dataset ,
    peft_config = peft_config ,
    use_flash_attention2 = True ,
    output_dir = utils . Variables . FINE_TUNED_MODEL_PATH
)
trained_model = trainer . train_model ( format_instruction_func = dataset_handler . format_instruction )
trained_model . save_model ()

ステップ 4 - 評価

微調整の結果を評価するために、2 つのテキストセット間の重複を比較してそれらの間の類似性を測定する、Recall-Oriented Understudy for Gisting Evaluation (ROUGE) スコアを採用しました。

具体的には、rouge_scorer ライブラリを使用して、テキスト間の 1 グラムと 2 グラムの重複を測定する ROUGE-1 と ROUGE-2 を計算しました。

 import pandas as pd
from rouge_score import rouge_scorer

def calculate_rouge_scores ( generated_answers , ground_truth ):
    scorer = rouge_scorer . RougeScorer ([ 'rouge1' , 'rouge2' , 'rougeL' ], use_stemmer = True )
    total_rouge1 , total_rouge2 , total_rougeL = 0 , 0 , 0
    for gen , ref in zip ( generated_answers , ground_truth ):
        scores = scorer . score ( gen , ref )
        total_rouge1 += scores [ 'rouge1' ]. fmeasure
        total_rouge2 += scores [ 'rouge2' ]. fmeasure
        total_rougeL += scores [ 'rougeL' ]. fmeasure
    average_rouge1 = total_rouge1 / len ( generated_answers )
    average_rouge2 = total_rouge2 / len ( generated_answers )
    average_rougeL = total_rougeL / len ( generated_answers )
    return { 'average_rouge1' : average_rouge1 ,
            'average_rouge2' : average_rouge2 ,
            'average_rougeL' : average_rougeL }

この計算を実行するには、テストデータセットから命令を取得し、それをベースモデルと微調整モデルの両方に渡し、その出力を命令/回答データセットからの期待される結果と比較します。

評価用のコードは /llama3_8b_finetuning/model_evaluation.py にあります。

 class ModelHandler :

    def __init__ ( self ):
        pass

    def loading_model ( self , model_chosen = 'fine_tuned_model' ):

        if model_chosen == 'fine_tuned_model' :
            model_dir = utils . Variables . FINE_TUNED_MODEL_PATH
            self . model = AutoPeftModelForCausalLM . from_pretrained (
                model_dir ,
                low_cpu_mem_usage = True ,
                torch_dtype = torch . float16 ,
                load_in_4bit = True ,
                )

        elif model_chosen == 'base_model' :
            model_dir = utils . Variables . BASE_MODEL_PATH
            self . model = AutoModelForCausalLM . from_pretrained (
                model_dir ,
                low_cpu_mem_usage = True ,
                torch_dtype = torch . float16 ,
                load_in_4bit = True ,
                )

        self . tokenizer = AutoTokenizer . from_pretrained ( model_dir )

    def ask_question ( self , instruction , temperature = 0.5 , max_new_tokens = 1000 ):

        prompt = format_instruction ( instruction )

        input_ids = self . tokenizer ( prompt , return_tensors = "pt" , truncation = True ). input_ids . cuda ()

        start_time = time . time ()
        with torch . inference_mode ():
            outputs = self . model . generate ( input_ids = input_ids , pad_token_id = self . tokenizer . eos_token_id , max_new_tokens = max_new_tokens , do_sample = True , top_p = 0.5 , temperature = temperature )
        end_time = time . time ()

        total_time = end_time - start_time
        output_length = len ( outputs [ 0 ]) - len ( input_ids [ 0 ])

        self . output = self . tokenizer . batch_decode ( outputs . detach (). cpu (). numpy (), skip_special_tokens = True )[ 0 ]

        return self . output

評価結果

ROUGEのスコアは以下の通りです。

微調整されたモデル:

{'average_rouge1': 0.39997816307812206, 'average_rouge2': 0.2213826792342886, 'average_rougeL': 0.33508922374837047}

ベースモデル:

{'average_rouge1': 0.2524191394349585, 'average_rouge2': 0.13402054342344535, 'average_rougeL': 0.2115590931984475}

したがって、テストデータセット上の微調整モデルのパフォーマンスは、ベースモデルのパフォーマンスよりも大幅に優れていることがわかります。

代替案

このコードを書いて動作させるまでにはかなりの時間がかかりました。これは良い習慣でしたが、毎日の微調整関連のジョブには、ローカルでホストされている Hugging Face AutoTrain (https://github.com/huggingface/autotrain-advanced) を使用してください。

画像.png