gpt3datagen
1.0.0
GPT3Datagen是一個Python軟件包,生成偽造數據,用於微調openai
型號。
_ ___ _ _
( )_ /'_ ) ( ) ( )_
__ _ _ | , _ )( _ ) _ ) | _ | | _ _ | , _ ) _ _ __ __ __ _
/' _ ` ( '_ ` | | _ ( _ < /' _ ` | /'_ ` )| | /' _ ` ) /'_ ` /' __ ` /' _ `
( ( _ ) || ( _ ) )| |_ ( ) _ ) |( ( _ | |( ( _ | || |_ ( ( _ | |( ( _ ) |( ___ /| ( ) |
` __ || ,__/' ` _ _ ) ` ____) ` _ _ , _ ) ` __,_) ` _ _ ) ` __,_) ` _ _ |` _ ___ )( _ ) ( _ )v0.1.0
( ) _ ) || | ( ) _ ) |
_ __ /'( _ ) _ __ /'
pip install -U gpt3datagen
另外,以下命令將及其python依賴關係從該存儲庫中刪除和安裝最新的提交:
pip install git+https://github.com/donwany/gpt3datagen.git --use-pep517
或git克隆存儲庫:
git clone https://github.com/donwany/gpt3datagen.git
cd gpt3datagen
make install && pip install -e .
要將軟件包更新為此存儲庫的最新版本,請運行:
pip install --upgrade --no-deps --force-reinstall git+https://github.com/donwany/gpt3datagen.git
運行以下內容以查看所有可用選項:
gpt3datagen --help
gpt3datagen --version
輸出格式: jsonl
, json
, csv
, tsv
, xlsx
gpt3datagen
--num_samples 500
--max_length 2048
--sample_type " classification "
--output_format " jsonl "
--output_dir .
gpt3datagen
--num_samples 500
--max_length 2048
--sample_type completion
--output_format csv
--output_dir .
gpt3datagen
--sample_type completion
--output_format jsonl
--output_dir .
gpt3datagen --sample_type completion -o . -f jsonl
gpt3datagen --sample_type news -o . -f jsonl
{ " prompt " : " <prompt text> nn###nn " , " completion " : " <ideal generated text> END " }
{ " prompt " : " <prompt text> nn###nn " , " completion " : " <ideal generated text> END " }
{ " prompt " : " <prompt text> nn###nn " , " completion " : " <ideal generated text> END " }
...
僅當您克隆存儲庫時才有用
python prepare.py
--num_samples 500
--max_length 2048
--sample_type " classification "
--output_format " jsonl "
--output_dir .
python prepare.py
--num_samples 500
--max_length 2048
--sample_type " completion "
--output_format " csv "
--output_dir .
python prepare.py
--num_samples 500
--max_length 2048
--sample_type " completion "
--output_format " json "
--output_dir /Users/ < tsiameh > /Desktop
pip install --upgrade openai
export OPENAI_API_KEY= " <OPENAI_API_KEY> "
# validate sample datasets generated
openai tools fine_tunes.prepare_data -f < SAMPLE_DATA > .jsonl
openai tools fine_tunes.prepare_data -f < SAMPLE_DATA > .csv
openai tools fine_tunes.prepare_data -f < SAMPLE_DATA > .tsv
openai tools fine_tunes.prepare_data -f < SAMPLE_DATA > .json
openai tools fine_tunes.prepare_data -f < SAMPLE_DATA > .xlsx
openai tools fine_tunes.prepare_data -f /Users/ < tsiameh > /Desktop/data_prepared.jsonl
# fine-tune
openai api fine_tunes.create
-t < DATA_PREPARED > .jsonl
-m < BASE_MODEL: davinci, curie, ada, babbage >
# List all created fine-tunes
openai api fine_tunes.list
# For multiclass classification
openai api fine_tunes.create
-t < TRAIN_FILE_ID_OR_PATH >
-v < VALIDATION_FILE_OR_PATH >
-m < MODEL >
--compute_classification_metrics
--classification_n_classes < N_CLASSES >
# For binary classification
openai api fine_tunes.create
-t < TRAIN_FILE_ID_OR_PATH >
-v < VALIDATION_FILE_OR_PATH >
-m < MODEL >
--compute_classification_metrics
--classification_n_classes 2
--classification_positive_class < POSITIVE_CLASS_FROM_DATASET >
請參閱貢獻。
GPT3Datagen根據MIT許可發布。有關詳細信息,請參見捆綁的許可證文件。
Theophilus Siameh