DALLE mtf 다운로드 - DALLE mtf 소스 코드 다운로드

DALLE mtf

AI 소스 코드

1.0.0

다운로드

Mesh-Tensorflow의 DALL-E [WIP]

Mesh-Tensorflow의 Open-AI DALL-E.

이것이 GPT-Neo와 유사하게 효율적이라면 이 저장소는 Open-AI의 DALL-E(12B 매개변수) 크기보다 더 큰 모델을 훈련할 수 있어야 합니다.

사전 훈련된 모델이 없습니다... 아직은요.

tf vae 구현과 mtf 버전 작동을 위한 Ben Wang, mtf VAE 및 입력 파이프라인 구축에 도움을 준 Aran Komatsuzaki에게 감사드립니다.

설정

git clone https://github.com/EleutherAI/GPTNeo
cd GPTNeo
pip3 install -r requirements.txt

훈련 설정

TPU에서 실행되며 GPU에서는 테스트되지 않았지만 이론적으로는 작동합니다. 예시 구성은 TPU v3-32 Pod에서 실행되도록 설계되었습니다.

TPU를 설정하려면 Google Cloud Platform에 가입하고 저장소 버킷을 만드세요.

Google 버킷 및 TPU에 연결하고 위와 같이 저장소를 설정할 수 있도록 ctpu up --vm-only 사용하여 Google 셸( https://ssh.cloud.google.com/ )을 통해 VM을 만듭니다.

VAE 사전 훈련

DALLE는 이미지를 토큰으로 압축하기 위해 사전 훈련된 VAE가 필요합니다. VAE 사전 학습을 실행하려면 configs/vae_example.json 의 매개변수를 jpg 데이터 세트를 가리키는 glob 경로로 조정하고 이미지 크기를 적절한 크기로 조정합니다.

  "dataset": {
    "train_path": "gs://neo-datasets/CIFAR-10-images/train/**/*.jpg",
    "eval_path": "gs://neo-datasets/CIFAR-10-images/test/**/*.jpg",
    "image_size": 32
  }

모든 설정이 완료되면 TPU를 생성하고 다음을 실행합니다.

python train_vae_tf.py --tpu your_tpu_name --model vae_example

학습 로그 이미지 텐서 및 손실 값은 진행 상황을 확인하기 위해 다음을 실행할 수 있습니다.

tensorboard --logdir your_model_dir

데이터세트 생성 [DALL-E]

VAE가 사전 훈련되면 DALL-E로 이동할 수 있습니다.

현재 우리는 더미 데이터 세트를 훈련하고 있습니다. DALL-E를 위한 대규모 공개 데이터 세트가 작업 중입니다. 그동안 더미 데이터를 생성하려면 다음을 실행하세요.

python src/data/create_tfrecords.py

CIFAR-10을 다운로드하고 텍스트 입력 역할을 할 임의의 캡션을 생성해야 합니다.

사용자 정의 데이터 세트는 다음과 같이 캡션 데이터와 각 이미지에 대한 경로를 포함하는 루트 폴더의 jsonl 파일을 사용하여 폴더 형식을 지정해야 합니다.

 Folder structure:

        data_folder
            jsonl_file
            folder_1
                img1
                img2
                ...
            folder_2
                img1
                img2
                ...
            ...

jsonl structure:
    {"image_path": folder_1/img1, "caption": "some words"}
    {"image_path": folder_2/img2, "caption": "more words"}
    ...

그런 다음 src/data/create_tfrecords.py 의 create_paired_dataset 함수를 사용하여 훈련에 사용할 데이터세트를 tfrecords로 인코딩할 수 있습니다.

데이터 세트가 생성되면 gsutil을 사용하여 버킷에 복사합니다.

gsutil cp -r DALLE-tfrecords gs://neo-datasets/

마지막으로 다음을 사용하여 훈련을 실행합니다.

python train_dalle.py --tpu your_tpu_name --model dalle_example

구성 가이드

VAE:

 {
  "model_type": "vae",
  "dataset": {
    "train_path": "gs://neo-datasets/CIFAR-10-images/train/**/*.jpg", # glob path to training images
    "eval_path": "gs://neo-datasets/CIFAR-10-images/test/**/*.jpg", # glob path to eval images
    "image_size": 32 # size of images (all images will be cropped / padded to this size)
  },
  "train_batch_size": 32, 
  "eval_batch_size": 32,
  "predict_batch_size": 32,
  "steps_per_checkpoint": 1000, # how often to save a checkpoint
  "iterations": 500, # number of batches to infeed to the tpu at a time. Must be < steps_per_checkpoint
  "train_steps": 100000, # total training steps
  "eval_steps": 0, # run evaluation for this many steps every steps_per_checkpoint
  "model_path": "gs://neo-models/vae_test2/", # directory in which to save the model
  "mesh_shape": "data:16,model:2", # mapping of processors to named dimensions - see mesh-tensorflow repo for more info
  "layout": "batch_dim:data", # which named dimensions of the model to split across the mesh - see mesh-tensorflow repo for more info
  "num_tokens": 512, # vocab size
  "dim": 512, 
  "hidden_dim": 64, # size of hidden dim
  "n_channels": 3, # number of input channels
  "bf_16": false, # if true, the model is trained with bfloat16 precision
  "lr": 0.001, # learning rate [by default learning rate starts at this value, then decays to 10% of this value over the course of the training]
  "num_layers": 3, # number of blocks in the encoder / decoder
  "train_gumbel_hard": true, # whether to use hard or soft gumbel_softmax
  "eval_gumbel_hard": true
}

DALL-E:

 {
  "model_type": "dalle",
  "dataset": {
    "train_path": "gs://neo-datasets/DALLE-tfrecords/*.tfrecords", # glob path to tfrecords data
    "eval_path": "gs://neo-datasets/DALLE-tfrecords/*.tfrecords",
    "image_size": 32 # size of images (all images will be cropped / padded to this size)
  },
  "train_batch_size": 32, # see above
  "eval_batch_size": 32,
  "predict_batch_size": 32,
  "steps_per_checkpoint": 1000,
  "iterations": 500,
  "train_steps": 100000,
  "predict_steps": 0,
  "eval_steps": 0,
  "n_channels": 3,
  "bf_16": false,
  "lr": 0.001,
  "model_path": "gs://neo-models/dalle_test/",
  "mesh_shape": "data:16,model:2",
  "layout": "batch_dim:data",
  "n_embd": 512, # size of embedding dim
  "text_vocab_size": 50258, # vocabulary size of the text tokenizer
  "image_vocab_size": 512, # vocabulary size of the vae - should equal num_tokens above
  "text_seq_len": 256, # length of text inputs (all inputs longer / shorter will be truncated / padded)
  "n_layers": 6, 
  "n_heads": 4, # number of attention heads. For best performance, n_embd / n_heads should equal 128
  "vae_model": "vae_example" # path to or name of vae model config
}

확장하다

추가 정보