gpt3datagen 다운로드 gpt3datagen 소스 코드 다운로드

gpt3datagen

AI 소스 코드

1.0.0

다운로드

gpt3datagen

GPT3Datagen은 openai 모델을 미세 조정하기위한 가짜 데이터를 생성하는 파이썬 패키지입니다.

               _      ___      _         _
              ( )_  /'_  )    ( )       ( )_
   __   _ _   | , _ )( _ ) _ ) |   _ | |   _ _ | , _ )   _ _    __     __    __ _
 /' _ ` ( '_ `  | |   _ ( _ <  /' _ ` | /'_ ` )| |   /' _ ` ) /'_ `  /' __ ` /' _ ` 
( ( _ ) || ( _ ) )| |_ ( ) _ ) |( ( _ | |( ( _ | || |_ ( ( _ | |( ( _ ) |(  ___ /| ( ) |
` __  || ,__/' ` _ _ ) ` ____) ` _ _ , _ ) ` __,_) ` _ _ ) ` __,_) ` _ _  |` _ ___ )( _ ) ( _ )v0.1.0
( ) _ ) || |                                          ( ) _ ) |
 _ __ /'( _ )                                           _ __ /'

PIP로 설치하십시오. 설치 및 사용 안내서를 참조하십시오

pip install -U gpt3datagen

또는 다음 명령은 파이썬 종속성과 함께이 저장소에서 최신 커밋을 끌어 내고 설치합니다.

pip install git+https://github.com/donwany/gpt3datagen.git --use-pep517

또는 git 클론 저장소 :

git clone https://github.com/donwany/gpt3datagen.git
cd gpt3datagen
make install && pip install -e .

이 저장소의 최신 버전으로 패키지를 업데이트하려면 다음을 실행하십시오.

pip install --upgrade --no-deps --force-reinstall git+https://github.com/donwany/gpt3datagen.git

명령 줄 사용

사용 가능한 모든 옵션을 보려면 다음을 실행하십시오.

gpt3datagen --help
gpt3datagen --version

출력 형식 : jsonl , json , csv , tsv , xlsx

gpt3datagen 
    --num_samples 500 
    --max_length 2048 
    --sample_type " classification " 
    --output_format " jsonl " 
    --output_dir .

gpt3datagen 
    --num_samples 500 
    --max_length 2048 
    --sample_type completion 
    --output_format csv 
    --output_dir .

gpt3datagen 
    --sample_type completion 
    --output_format jsonl 
    --output_dir .

gpt3datagen --sample_type completion -o . -f jsonl
gpt3datagen --sample_type news -o . -f jsonl

데이터 형식

{ " prompt " : " <prompt text> nn###nn " , " completion " : " <ideal generated text> END " }
{ " prompt " : " <prompt text> nn###nn " , " completion " : " <ideal generated text> END " }
{ " prompt " : " <prompt text> nn###nn " , " completion " : " <ideal generated text> END " }
                                    ...

기본 사용

저장소를 복제하는 경우에만 유용합니다

python prepare.py 
    --num_samples 500 
    --max_length 2048 
    --sample_type " classification " 
    --output_format " jsonl " 
    --output_dir .

python prepare.py 
    --num_samples 500 
    --max_length 2048 
    --sample_type " completion " 
    --output_format " csv " 
    --output_dir .

python prepare.py 
    --num_samples 500 
    --max_length 2048 
    --sample_type " completion " 
    --output_format " json " 
    --output_dir /Users/ < tsiameh > /Desktop

샘플 데이터를 확인하십시오

pip install --upgrade openai

export OPENAI_API_KEY= " <OPENAI_API_KEY> "

# validate sample datasets generated
openai tools fine_tunes.prepare_data -f < SAMPLE_DATA > .jsonl
openai tools fine_tunes.prepare_data -f < SAMPLE_DATA > .csv
openai tools fine_tunes.prepare_data -f < SAMPLE_DATA > .tsv
openai tools fine_tunes.prepare_data -f < SAMPLE_DATA > .json
openai tools fine_tunes.prepare_data -f < SAMPLE_DATA > .xlsx
openai tools fine_tunes.prepare_data -f /Users/ < tsiameh > /Desktop/data_prepared.jsonl

# fine-tune
openai api fine_tunes.create 
  -t < DATA_PREPARED > .jsonl 
  -m < BASE_MODEL: davinci, curie, ada, babbage >

# List all created fine-tunes
openai api fine_tunes.list

테스트 실행

 # For multiclass classification
openai api fine_tunes.create 
  -t < TRAIN_FILE_ID_OR_PATH > 
  -v < VALIDATION_FILE_OR_PATH > 
  -m < MODEL > 
  --compute_classification_metrics 
  --classification_n_classes < N_CLASSES >

# For binary classification
openai api fine_tunes.create 
  -t < TRAIN_FILE_ID_OR_PATH > 
  -v < VALIDATION_FILE_OR_PATH > 
  -m < MODEL > 
  --compute_classification_metrics 
  --classification_n_classes 2 
  --classification_positive_class < POSITIVE_CLASS_FROM_DATASET >