classifier free guidance pytorchダウンロード - classifier free guidance pytorchソースコードのダウンロード

分類子の無料ガイダンス - Pytorch

Pytorch での Classifier Free ガイダンスの実装。テキストのコンディショニングと、eDiff-I で行われているような複数のテキスト埋め込みモデルを含める柔軟性に重点を置いています。

テキストガイダンスがモデルへの究極のインターフェイスであることは明らかです。このリポジトリは、Python デコレータの魔法を利用して、SOTA テキストコンディショニングを任意のモデルに簡単に組み込むことができます。

感謝

StabilityAI の寛大なスポンサーシップと、他のスポンサーの皆様
?彼らの素晴らしいトランスフォーマーライブラリにハグフェイス。最新の研究が推奨しているように、テキスト調整モジュールは T5 埋め込みを使用します。
SOTA オープンソース CLIP モデルを提供する OpenCLIP。 eDiff モデルは、T5 埋め込みと CLIP テキスト埋め込みを組み合わせることで大幅に改善されています。

インストール

$ pip install classifier-free-guidance-pytorch

使用法

 import torch
from classifier_free_guidance_pytorch import TextConditioner

text_conditioner = TextConditioner (
    model_types = 't5' ,    
    hidden_dims = ( 256 , 512 ),
    hiddens_channel_first = False ,
    cond_drop_prob = 0.2  # conditional dropout 20% of the time, must be greater than 0. to unlock classifier free guidance
). cuda ()

# pass in your text as a List[str], and get back a List[callable]
# each callable function receives the hiddens in the dimensions listed at init (hidden_dims)

first_condition_fn , second_condition_fn = text_conditioner ([ 'a dog chasing after a ball' ])

# these hiddens will be in the direct flow of your model, say in a unet

first_hidden = torch . randn ( 1 , 16 , 256 ). cuda ()
second_hidden = torch . randn ( 1 , 32 , 512 ). cuda ()

# conditioned features

first_conditioned = first_condition_fn ( first_hidden )
second_conditioned = second_condition_fn ( second_hidden )

クロスアテンションベースの条件付けを使用したい場合 (ネットワーク内の各非表示機能が個々のサブワードトークンに対応できます)、代わりにAttentionTextConditionerをインポートするだけです。休みも同じ

 from classifier_free_guidance_pytorch import AttentionTextConditioner

text_conditioner = AttentionTextConditioner (
    model_types = ( 't5' , 'clip' ),   # something like in eDiff paper, where they used both T5 and Clip for even better results (Balaji et al.)
    hidden_dims = ( 256 , 512 ),
    cond_drop_prob = 0.2
)

マジッククラスデコレーター

これは、ネットワークのテキスト調整をできるだけ簡単にするために進行中の作業です。

まず、単純な 2 層ネットワークがあるとします。

 import torch
from torch import nn

class MLP ( nn . Module ):
    def __init__ (
        self ,
        dim
    ):
        super (). __init__ ()
        self . proj_in = nn . Sequential ( nn . Linear ( dim , dim * 2 ), nn . ReLU ())
        self . proj_mid = nn . Sequential ( nn . Linear ( dim * 2 , dim ), nn . ReLU ())
        self . proj_out = nn . Linear ( dim , 1 )

    def forward (
        self ,
        data
    ):
        hiddens1 = self . proj_in ( data )
        hiddens2 = self . proj_mid ( hiddens1 )
        return self . proj_out ( hiddens2 )

# instantiate model and pass in some data, get (in this case) a binary prediction

model = MLP ( dim = 256 )

data = torch . randn ( 2 , 256 )

pred = model ( data )

非表示レイヤー ( hiddens1およびhiddens2 ) をテキストで条件付けしたいとします。ここの各バッチ要素は独自のフリーテキスト条件を取得します。

このリポジトリを使用すると、これは 3 ステップまで削減されました。

 import torch
from torch import nn

from classifier_free_guidance_pytorch import classifier_free_guidance_class_decorator

@ classifier_free_guidance_class_decorator
class MLP ( nn . Module ):
    def __init__ ( self , dim ):
        super (). __init__ ()

        self . proj_in = nn . Sequential ( nn . Linear ( dim , dim * 2 ), nn . ReLU ())
        self . proj_mid = nn . Sequential ( nn . Linear ( dim * 2 , dim ), nn . ReLU ())
        self . proj_out = nn . Linear ( dim , 1 )

    def forward (
        self ,
        inp ,
        cond_fns # List[Callable] - (1) your forward function now receives a list of conditioning functions, which you invoke on your hidden tensors
    ):
        cond_hidden1 , cond_hidden2 = cond_fns # conditioning functions are given back in the order of the `hidden_dims` set on the text conditioner

        hiddens1 = self . proj_in ( inp )
        hiddens1 = cond_hidden1 ( hiddens1 ) # (2) condition the first hidden layer with FiLM

        hiddens2 = self . proj_mid ( hiddens1 )
        hiddens2 = cond_hidden2 ( hiddens2 ) # condition the second hidden layer with FiLM

        return self . proj_out ( hiddens2 )

# instantiate your model - extra keyword arguments will need to be defined, prepended by `text_condition_`

model = MLP (
    dim = 256 ,
    text_condition_type = 'film' ,                 # can be film, attention, or null (none)
    text_condition_model_types = ( 't5' , 'clip' ),  # in this example, conditioning on both T5 and OpenCLIP
    text_condition_hidden_dims = ( 512 , 256 ),      # and pass in the hidden dimensions you would like to condition on. in this case there are two hidden dimensions (dim * 2 and dim, after the first and second projections)
    text_condition_cond_drop_prob = 0.25          # conditional dropout probability for classifier free guidance. can be set to 0. if you do not need it and just want the text conditioning
)

# now you have your input data as well as corresponding free text as List[str]

data = torch . randn ( 2 , 256 )
texts = [ 'a description' , 'another description' ]

# (3) train your model, passing in your list of strings as 'texts'

pred  = model ( data , texts = texts )

# after much training, you can now do classifier free guidance by passing in a condition scale of > 1. !

model . eval ()
guided_pred = model ( data , texts = texts , cond_scale = 3. , remove_parallel_component = True )  # cond_scale stands for conditioning scale from classifier free guidance paper

藤堂

完全なフィルムコンディショニング、分類器なしの無料ガイダンス (ここで使用)
フィルムコンディショニングのための分類器無料ガイダンスを追加
完全なクロスアテンションコンディショニング
メイクアビデオでの時空ユニットのストレステスト

引用

 @article { Ho2022ClassifierFreeDG ,
    title   = { Classifier-Free Diffusion Guidance } ,
    author  = { Jonathan Ho } ,
    journal = { ArXiv } ,
    year    = { 2022 } ,
    volume  = { abs/2207.12598 }
}

 @article { Balaji2022eDiffITD ,
    title   = { eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers } ,
    author  = { Yogesh Balaji and Seungjun Nah and Xun Huang and Arash Vahdat and Jiaming Song and Karsten Kreis and Miika Aittala and Timo Aila and Samuli Laine and Bryan Catanzaro and Tero Karras and Ming-Yu Liu } ,
    journal = { ArXiv } ,
    year    = { 2022 } ,
    volume  = { abs/2211.01324 }
}

 @inproceedings { dao2022flashattention ,
    title   = { Flash{A}ttention: Fast and Memory-Efficient Exact Attention with {IO}-Awareness } ,
    author  = { Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R{'e}, Christopher } ,
    booktitle = { Advances in Neural Information Processing Systems } ,
    year    = { 2022 }
}

 @inproceedings { Lin2023CommonDN ,
    title   = { Common Diffusion Noise Schedules and Sample Steps are Flawed } ,
    author  = { Shanchuan Lin and Bingchen Liu and Jiashi Li and Xiao Yang } ,
    year    = { 2023 }
}

 @inproceedings { Chung2024CFGMC ,
    title   = { CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models } ,
    author  = { Hyungjin Chung and Jeongsol Kim and Geon Yeong Park and Hyelin Nam and Jong Chul Ye } ,
    year    = { 2024 } ,
    url     = { https://api.semanticscholar.org/CorpusID:270391454 }
}

 @inproceedings { Sadat2024EliminatingOA ,
    title   = { Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models } ,
    author  = { Seyedmorteza Sadat and Otmar Hilliges and Romann M. Weber } ,
    year    = { 2024 } ,
    url     = { https://api.semanticscholar.org/CorpusID:273098845 }
}