gpt3datagen下載gpt3datagen源代碼下載

gpt3datagen

Ai源碼

1.0.0

下載

GPT3Datagen

GPT3Datagen是一個Python軟件包，生成偽造數據，用於微調openai型號。

               _      ___      _         _
              ( )_  /'_  )    ( )       ( )_
   __   _ _   | , _ )( _ ) _ ) |   _ | |   _ _ | , _ )   _ _    __     __    __ _
 /' _ ` ( '_ `  | |   _ ( _ <  /' _ ` | /'_ ` )| |   /' _ ` ) /'_ `  /' __ ` /' _ ` 
( ( _ ) || ( _ ) )| |_ ( ) _ ) |( ( _ | |( ( _ | || |_ ( ( _ | |( ( _ ) |(  ___ /| ( ) |
` __  || ,__/' ` _ _ ) ` ____) ` _ _ , _ ) ` __,_) ` _ _ ) ` __,_) ` _ _  |` _ ___ )( _ ) ( _ )v0.1.0
( ) _ ) || |                                          ( ) _ ) |
 _ __ /'( _ )                                           _ __ /'

與PIP安裝。請參閱安裝和使用指南

pip install -U gpt3datagen

另外，以下命令將及其python依賴關係從該存儲庫中刪除和安裝最新的提交：

pip install git+https://github.com/donwany/gpt3datagen.git --use-pep517

或git克隆存儲庫：

git clone https://github.com/donwany/gpt3datagen.git
cd gpt3datagen
make install && pip install -e .

要將軟件包更新為此存儲庫的最新版本，請運行：

pip install --upgrade --no-deps --force-reinstall git+https://github.com/donwany/gpt3datagen.git

命令行的用法

運行以下內容以查看所有可用選項：

gpt3datagen --help
gpt3datagen --version

輸出格式： jsonl ， json ， csv ， tsv ， xlsx

gpt3datagen 
    --num_samples 500 
    --max_length 2048 
    --sample_type " classification " 
    --output_format " jsonl " 
    --output_dir .

gpt3datagen 
    --num_samples 500 
    --max_length 2048 
    --sample_type completion 
    --output_format csv 
    --output_dir .

gpt3datagen 
    --sample_type completion 
    --output_format jsonl 
    --output_dir .

gpt3datagen --sample_type completion -o . -f jsonl
gpt3datagen --sample_type news -o . -f jsonl

數據格式

{ " prompt " : " <prompt text> nn###nn " , " completion " : " <ideal generated text> END " }
{ " prompt " : " <prompt text> nn###nn " , " completion " : " <ideal generated text> END " }
{ " prompt " : " <prompt text> nn###nn " , " completion " : " <ideal generated text> END " }
                                    ...

基本用法

僅當您克隆存儲庫時才有用

python prepare.py 
    --num_samples 500 
    --max_length 2048 
    --sample_type " classification " 
    --output_format " jsonl " 
    --output_dir .

python prepare.py 
    --num_samples 500 
    --max_length 2048 
    --sample_type " completion " 
    --output_format " csv " 
    --output_dir .

python prepare.py 
    --num_samples 500 
    --max_length 2048 
    --sample_type " completion " 
    --output_format " json " 
    --output_dir /Users/ < tsiameh > /Desktop

驗證樣本數據

pip install --upgrade openai

export OPENAI_API_KEY= " <OPENAI_API_KEY> "

# validate sample datasets generated
openai tools fine_tunes.prepare_data -f < SAMPLE_DATA > .jsonl
openai tools fine_tunes.prepare_data -f < SAMPLE_DATA > .csv
openai tools fine_tunes.prepare_data -f < SAMPLE_DATA > .tsv
openai tools fine_tunes.prepare_data -f < SAMPLE_DATA > .json
openai tools fine_tunes.prepare_data -f < SAMPLE_DATA > .xlsx
openai tools fine_tunes.prepare_data -f /Users/ < tsiameh > /Desktop/data_prepared.jsonl

# fine-tune
openai api fine_tunes.create 
  -t < DATA_PREPARED > .jsonl 
  -m < BASE_MODEL: davinci, curie, ada, babbage >

# List all created fine-tunes
openai api fine_tunes.list

測試運行

 # For multiclass classification
openai api fine_tunes.create 
  -t < TRAIN_FILE_ID_OR_PATH > 
  -v < VALIDATION_FILE_OR_PATH > 
  -m < MODEL > 
  --compute_classification_metrics 
  --classification_n_classes < N_CLASSES >

# For binary classification
openai api fine_tunes.create 
  -t < TRAIN_FILE_ID_OR_PATH > 
  -v < VALIDATION_FILE_OR_PATH > 
  -m < MODEL > 
  --compute_classification_metrics 
  --classification_n_classes 2 
  --classification_positive_class < POSITIVE_CLASS_FROM_DATASET >