deep dazeダウンロード - deep dazeソースコードのダウンロード

ディープデイズ

緑の丘の上の霧

草の上に砕けた皿

宇宙の愛と注目

群衆の中のタイムトラベラー

ペスト流行中の生活

太陽に照らされた森の中での瞑想的な平和

完全に赤い画像を描く男性

LSDでのサイケデリックな体験

これは何ですか？

OpenAI の CLIP と Siren を使用してテキストから画像を生成するためのシンプルなコマンドラインツール。この手法の発見 (そして素晴らしい名前の考案) は Ryan Murdock の功績です。

オリジナルノート

新しい簡易ノート

これには、Nvidia GPU または AMD GPU が必要です。

推奨: 16GB VRAM
最小要件: 4GB VRAM (非常に低い設定を使用、以下の使用手順を参照)

インストール

$ pip install deep-daze

Windowsのインストール

Python がインストールされていると仮定します。

コマンドプロンプトを開き、現在のバージョンの Python のディレクトリに移動します。

  pip install deep-daze

例

$ imagine " a house in the forest "

Windowsの場合:

管理者としてコマンドプロンプトを開きます

  imagine " a house in the forest "

それでおしまい。

十分なメモリがある場合は、 --deeperフラグを追加すると品質が向上します。

$ imagine " shattered plates on the ground " --deeper

高度な

真の深層学習方式では、層が多いほどより良い結果が得られます。デフォルトは16ですが、リソースに応じて32まで増やすことができます。

$ imagine " stranger in strange lands " --num-layers 32

使用法

CLI

NAME
    imagine

SYNOPSIS
    imagine TEXT < flags >

POSITIONAL ARGUMENTS
    TEXT
        (required) A phrase less than 77 tokens which you would like to visualize.

FLAGS
    --img=IMAGE_PATH
        Default: None
        Path to png/jpg image or PIL image to optimize on
    --encoding=ENCODING
        Default: None
        User-created custom CLIP encoding. If used, replaces any text or image that was used.
    --create_story=CREATE_STORY
        Default: False
        Creates a story by optimizing each epoch on a new sliding-window of the input words. If this is enabled, much longer texts than 77 tokens can be used. Requires save_progress to visualize the transitions of the story.
    --story_start_words=STORY_START_WORDS
        Default: 5
        Only used if create_story is True. How many words to optimize on for the first epoch.
    --story_words_per_epoch=STORY_WORDS_PER_EPOCH
        Default: 5
        Only used if create_story is True. How many words to add to the optimization goal per epoch after the first one.
    --story_separator:
        Default: None
        Only used if create_story is True. Defines a separator like ' . ' that splits the text into groups for each epoch. Separator needs to be in the text otherwise it will be ignored
    --lower_bound_cutout=LOWER_BOUND_CUTOUT
        Default: 0.1
        Lower bound of the sampling of the size of the random cut-out of the SIREN image per batch. Should be smaller than 0.8.
    --upper_bound_cutout=UPPER_BOUND_CUTOUT
        Default: 1.0
        Upper bound of the sampling of the size of the random cut-out of the SIREN image per batch. Should probably stay at 1.0.
    --saturate_bound=SATURATE_BOUND
        Default: False
        If True, the LOWER_BOUND_CUTOUT is linearly increased to 0.75 during training.
    --learning_rate=LEARNING_RATE
        Default: 1e-05
        The learning rate of the neural net.
    --num_layers=NUM_LAYERS
        Default: 16
        The number of hidden layers to use in the Siren neural net.
    --batch_size=BATCH_SIZE
        Default: 4
        The number of generated images to pass into Siren before calculating loss. Decreasing this can lower memory and accuracy.
    --gradient_accumulate_every=GRADIENT_ACCUMULATE_EVERY
        Default: 4
        Calculate a weighted loss of n samples for each iteration. Increasing this can help increase accuracy with lower batch sizes.
    --epochs=EPOCHS
        Default: 20
        The number of epochs to run.
    --iterations=ITERATIONS
        Default: 1050
        The number of times to calculate and backpropagate loss in a given epoch.
    --save_every=SAVE_EVERY
        Default: 100
        Generate an image every time iterations is a multiple of this number.
    --image_width=IMAGE_WIDTH
        Default: 512
        The desired resolution of the image.
    --deeper=DEEPER
        Default: False
        Uses a Siren neural net with 32 hidden layers.
    --overwrite=OVERWRITE
        Default: False
        Whether or not to overwrite existing generated images of the same name.
    --save_progress=SAVE_PROGRESS
        Default: False
        Whether or not to save images generated before training Siren is complete.
    --seed=SEED
        Type: Optional[]
        Default: None
        A seed to be used for deterministic runs.
    --open_folder=OPEN_FOLDER
        Default: True
        Whether or not to open a folder showing your generated images.
    --save_date_time=SAVE_DATE_TIME
        Default: False
        Save files with a timestamp prepended e.g. ` %y%m%d-%H%M%S-my_phrase_here `
    --start_image_path=START_IMAGE_PATH
        Default: None
        The generator is trained first on a starting image before steered towards the textual input
    --start_image_train_iters=START_IMAGE_TRAIN_ITERS
        Default: 50
        The number of steps for the initial training on the starting image
    --theta_initial=THETA_INITIAL
        Default: 30.0
        Hyperparameter describing the frequency of the color space. Only applies to the first layer of the network.
    --theta_hidden=THETA_INITIAL
        Default: 30.0
        Hyperparameter describing the frequency of the color space. Only applies to the hidden layers of the network.
    --save_gif=SAVE_GIF
        Default: False
        Whether or not to save a GIF animation of the generation procedure. Only works if save_progress is set to True.

プライミング

Mario Klingemann によって最初に考案され共有された技術で、テキストに向けて操作される前に、開始画像でジェネレーターネットワークを準備することができます。

使用する画像へのパスを指定し、オプションで初期トレーニングステップの数を指定するだけです。

$ imagine ' a clear night sky filled with stars ' --start_image_path ./cloudy-night-sky.jpg

プライミングされた開始イメージ

次に、プロンプトのA pizza with green pepper.

画像の解釈を最適化する

ジェネレーターネットワークを準備するだけでなく、最適化目標として画像をフィードすることもできます。 Deepdaze は、その画像を独自に解釈してレンダリングします。

$ imagine --img samples/Autumn_1875_Frederic_Edwin_Church.jpg

元の画像:

ネットワークの解釈:

元の画像:

ネットワークの解釈:

テキストと画像を組み合わせて最適化する

$ imagine " A psychedelic experience. " --img samples/hot-dog.jpg

ネットワークの解釈:

新規: ストーリーを作成する

テキストの通常モードでは、77 個のトークンのみが許可されます。完全なストーリー/段落/歌/詩を視覚化したい場合は、 create_story Trueに設定します。

ロバート・フロストの詩「雪の降る夜に森に立ち寄る」を考えると、「ここが誰の森なのか、私は知っていると思います。しかし、彼の家は村の中にあります。彼は私が森が雪でいっぱいになるのを見るためにここに立ち寄るのを見ないだろう。」私の小さな馬は、それが奇妙だと思っているに違いありません森と凍った湖の間の近くに農家がないのに立ち止まるのは一年で最も暗い夜、彼はハーネスの鐘を振る、何か間違いがあるかどうかを尋ねるためです。穏やかな風と綿毛のような吹き抜け森は美しく、暗くて深い、しかし私には守らなければならない約束がある、そして寝るまでにあと何マイルも行く、そして私が寝るまでに何マイルも行く。」

得られるものは次のとおりです。

この森は誰のものなのか、私は知っていると思います。彼の家は村にありますが、彼は.mp4

パイソン

Python で`deep_daze.Imagine`を呼び出す

 from deep_daze import Imagine

imagine = Imagine (
    text = 'cosmic love and attention' ,
    num_layers = 24 ,
)
imagine ()

4 回の繰り返しごとに進行状況を保存する

画像を insert_text_here.00001.png、insert_text_here.00002.png、...最大(total_iterations % save_every)の形式で保存します

 imagine = Imagine (
    text = text ,
    save_every = 4 ,
    save_progress = True
)

各画像の先頭に現在のタイムスタンプを追加します。

タイムスタンプとシーケンス番号の両方を含むファイルを作成します。

例: 210129-043928_328751_insert_text_here.00001.png、210129-043928_512351_insert_text_here.00002.png、...

 imagine = Imagine (
    text = text ,
    save_every = 4 ,
    save_progress = True ,
    save_date_time = True ,
)

GPU メモリの使用量が多い

少なくとも 16 GiB の vram が利用可能であれば、ある程度の余裕を持ってこれらの設定を実行できるはずです。

 imagine = Imagine (
    text = text ,
    num_layers = 42 ,
    batch_size = 64 ,
    gradient_accumulate_every = 1 ,
)

平均 GPU メモリ使用量

 imagine = Imagine (
    text = text ,
    num_layers = 24 ,
    batch_size = 16 ,
    gradient_accumulate_every = 2
)

非常に低い GPU メモリ使用量 (4 GiB 未満)

どうしても 8 GiB 未満の VRAM を搭載したカードでこれを実行したい場合は、image_width を下げることができます。

 imagine = Imagine (
    text = text ,
    image_width = 256 ,
    num_layers = 16 ,
    batch_size = 1 ,
    gradient_accumulate_every = 16 # Increase gradient_accumulate_every to correct for loss in low batch sizes
)

VRAM と速度のベンチマーク:

これらの実験は 2060 Super RTX と 3700X Ryzen 5 を使用して実施されました。最初にパラメーター (bs = バッチサイズ)、次にメモリ使用量、場合によっては 1 秒あたりのトレーニング反復について説明します。

画像解像度 512 の場合:

bs 1、num_layers 22: 7.96 GB
bs 2、num_layers 20: 7.5 GB
bs 16、num_layers 16: 6.5 GB

画像解像度 256 の場合:

bs 8、num_layers 48: 5.3 GB
bs 16、num_layers 48: 5.46 GB - 2.0 it/s
bs 32、num_layers 48: 5.92 GB - 1.67 it/s
bs 8、num_layers 44: 5 GB - 2.39 it/s
bs 32、num_layers 44、grad_acc 1: 5.62 GB - 4.83 it/s
bs 96、num_layers 44、grad_acc 1: 7.51 GB - 2.77 it/s
bs 32、num_layers 66、grad_acc 1: 7.09 GB - 3.7 it/s

@NotNANtoN は、44 レイヤーと 1 ～ 8 エポックのトレーニングでバッチサイズ 32 を推奨します。

これはどこへ行くのでしょうか？

これは単なるティーザーです。私たちは自然言語を使って、画像、音声、あらゆるものを自由に生成できるようになります。ホロデッキは私たちが生きているうちに現実のものになりつつあります。

このテクノロジーの推進に興味がある場合は、Pytorch または Mesh Tensorflow 用の DALL-E のレプリケーションの取り組みに参加してください。

代替案

Big Sleep - CLIP と Big GAN のジェネレーター

引用

 @misc { unpublished2021clip ,
    title  = { CLIP: Connecting Text and Images } ,
    author = { Alec Radford, Ilya Sutskever, Jong Wook Kim, Gretchen Krueger, Sandhini Agarwal } ,
    year   = { 2021 }
}

 @misc { sitzmann2020implicit ,
    title   = { Implicit Neural Representations with Periodic Activation Functions } ,
    author  = { Vincent Sitzmann and Julien N. P. Martel and Alexander W. Bergman and David B. Lindell and Gordon Wetzstein } ,
    year    = { 2020 } ,
    eprint  = { 2006.09661 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.CV }
}

拡大する

deep daze

ディープデイズ

これは何ですか？

インストール

Windowsのインストール

例

高度な

使用法

CLI

プライミング

画像の解釈を最適化する

テキストと画像を組み合わせて最適化する

新規: ストーリーを作成する

パイソン

Python で`deep_daze.Imagine`を呼び出す

4 回の繰り返しごとに進行状況を保存する

各画像の先頭に現在のタイムスタンプを追加します。

GPU メモリの使用量が多い

平均 GPU メモリ使用量

非常に低い GPU メモリ使用量 (4 GiB 未満)

VRAM と速度のベンチマーク:

これはどこへ行くのでしょうか？

代替案

引用

DAZE CAMアプリ

ディープフィールド

ディープハンターゲーム

ディープディ

ディープレース：バトル

ディープルーン

chat.petals.dev

GPT Prompt Templates

GPTyped

node telegram bot api

typebot.io

python wechaty getting started

waymo open dataset

termwind

wp functions

deep daze

ディープデイズ

これは何ですか？

インストール

Windowsのインストール

例

高度な

使用法

CLI

プライミング

画像の解釈を最適化する

テキストと画像を組み合わせて最適化する

新規: ストーリーを作成する

パイソン

Python でdeep_daze.Imagineを呼び出す

4 回の繰り返しごとに進行状況を保存する

各画像の先頭に現在のタイムスタンプを追加します。

GPU メモリの使用量が多い

平均 GPU メモリ使用量

非常に低い GPU メモリ使用量 (4 GiB 未満)

VRAM と速度のベンチマーク:

これはどこへ行くのでしょうか？

代替案

引用

Python で`deep_daze.Imagine`を呼び出す