classifier free guidance pytorch
0.7.1
在 Pytorch 中實現無分類器指導,重點是文字調節,以及包含多個文字嵌入模型的靈活性,如 eDiff-I 中所做的那樣
現在很明顯,文本指導是模型的最終介面。這個儲存庫將利用一些 Python 裝飾器魔法,輕鬆將 SOTA 文字調節合併到任何模型中。
StabilityAI 以及我的其他贊助商的慷慨贊助
?擁抱他們令人驚嘆的變形金剛庫。根據最新研究建議,文字調節模組將使用 T5 嵌入
OpenCLIP 用於提供 SOTA 開源 CLIP 模型。透過將 T5 嵌入與 CLIP 文字嵌入相結合,eDiff 模型得到了巨大的改進
$ pip install classifier-free-guidance-pytorch
import torch
from classifier_free_guidance_pytorch import TextConditioner
text_conditioner = TextConditioner (
model_types = 't5' ,
hidden_dims = ( 256 , 512 ),
hiddens_channel_first = False ,
cond_drop_prob = 0.2 # conditional dropout 20% of the time, must be greater than 0. to unlock classifier free guidance
). cuda ()
# pass in your text as a List[str], and get back a List[callable]
# each callable function receives the hiddens in the dimensions listed at init (hidden_dims)
first_condition_fn , second_condition_fn = text_conditioner ([ 'a dog chasing after a ball' ])
# these hiddens will be in the direct flow of your model, say in a unet
first_hidden = torch . randn ( 1 , 16 , 256 ). cuda ()
second_hidden = torch . randn ( 1 , 32 , 512 ). cuda ()
# conditioned features
first_conditioned = first_condition_fn ( first_hidden )
second_conditioned = second_condition_fn ( second_hidden )
如果您希望使用基於交叉注意力的調節(網路中的每個隱藏功能都可以處理單一子詞標記),只需匯入AttentionTextConditioner
即可。休息也是一樣
from classifier_free_guidance_pytorch import AttentionTextConditioner
text_conditioner = AttentionTextConditioner (
model_types = ( 't5' , 'clip' ), # something like in eDiff paper, where they used both T5 and Clip for even better results (Balaji et al.)
hidden_dims = ( 256 , 512 ),
cond_drop_prob = 0.2
)
這項工作正在進行中,旨在盡可能輕鬆地透過文字調節您的網路。
首先,假設您有一個簡單的兩層網絡
import torch
from torch import nn
class MLP ( nn . Module ):
def __init__ (
self ,
dim
):
super (). __init__ ()
self . proj_in = nn . Sequential ( nn . Linear ( dim , dim * 2 ), nn . ReLU ())
self . proj_mid = nn . Sequential ( nn . Linear ( dim * 2 , dim ), nn . ReLU ())
self . proj_out = nn . Linear ( dim , 1 )
def forward (
self ,
data
):
hiddens1 = self . proj_in ( data )
hiddens2 = self . proj_mid ( hiddens1 )
return self . proj_out ( hiddens2 )
# instantiate model and pass in some data, get (in this case) a binary prediction
model = MLP ( dim = 256 )
data = torch . randn ( 2 , 256 )
pred = model ( data )
您想用文字調節隱藏層( hiddens1
和hiddens2
)。這裡的每個批次元素都會有自己的自由文字條件
使用此儲存庫,這已減少到約 3 個步驟。
import torch
from torch import nn
from classifier_free_guidance_pytorch import classifier_free_guidance_class_decorator
@ classifier_free_guidance_class_decorator
class MLP ( nn . Module ):
def __init__ ( self , dim ):
super (). __init__ ()
self . proj_in = nn . Sequential ( nn . Linear ( dim , dim * 2 ), nn . ReLU ())
self . proj_mid = nn . Sequential ( nn . Linear ( dim * 2 , dim ), nn . ReLU ())
self . proj_out = nn . Linear ( dim , 1 )
def forward (
self ,
inp ,
cond_fns # List[Callable] - (1) your forward function now receives a list of conditioning functions, which you invoke on your hidden tensors
):
cond_hidden1 , cond_hidden2 = cond_fns # conditioning functions are given back in the order of the `hidden_dims` set on the text conditioner
hiddens1 = self . proj_in ( inp )
hiddens1 = cond_hidden1 ( hiddens1 ) # (2) condition the first hidden layer with FiLM
hiddens2 = self . proj_mid ( hiddens1 )
hiddens2 = cond_hidden2 ( hiddens2 ) # condition the second hidden layer with FiLM
return self . proj_out ( hiddens2 )
# instantiate your model - extra keyword arguments will need to be defined, prepended by `text_condition_`
model = MLP (
dim = 256 ,
text_condition_type = 'film' , # can be film, attention, or null (none)
text_condition_model_types = ( 't5' , 'clip' ), # in this example, conditioning on both T5 and OpenCLIP
text_condition_hidden_dims = ( 512 , 256 ), # and pass in the hidden dimensions you would like to condition on. in this case there are two hidden dimensions (dim * 2 and dim, after the first and second projections)
text_condition_cond_drop_prob = 0.25 # conditional dropout probability for classifier free guidance. can be set to 0. if you do not need it and just want the text conditioning
)
# now you have your input data as well as corresponding free text as List[str]
data = torch . randn ( 2 , 256 )
texts = [ 'a description' , 'another description' ]
# (3) train your model, passing in your list of strings as 'texts'
pred = model ( data , texts = texts )
# after much training, you can now do classifier free guidance by passing in a condition scale of > 1. !
model . eval ()
guided_pred = model ( data , texts = texts , cond_scale = 3. , remove_parallel_component = True ) # cond_scale stands for conditioning scale from classifier free guidance paper
完整的薄膜調節,無需分類器免費指導(此處使用)
添加薄膜調節的分類器免費指導
完全交叉注意調節
製作影片中時空unet的壓力測試
@article { Ho2022ClassifierFreeDG ,
title = { Classifier-Free Diffusion Guidance } ,
author = { Jonathan Ho } ,
journal = { ArXiv } ,
year = { 2022 } ,
volume = { abs/2207.12598 }
}
@article { Balaji2022eDiffITD ,
title = { eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers } ,
author = { Yogesh Balaji and Seungjun Nah and Xun Huang and Arash Vahdat and Jiaming Song and Karsten Kreis and Miika Aittala and Timo Aila and Samuli Laine and Bryan Catanzaro and Tero Karras and Ming-Yu Liu } ,
journal = { ArXiv } ,
year = { 2022 } ,
volume = { abs/2211.01324 }
}
@inproceedings { dao2022flashattention ,
title = { Flash{A}ttention: Fast and Memory-Efficient Exact Attention with {IO}-Awareness } ,
author = { Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R{'e}, Christopher } ,
booktitle = { Advances in Neural Information Processing Systems } ,
year = { 2022 }
}
@inproceedings { Lin2023CommonDN ,
title = { Common Diffusion Noise Schedules and Sample Steps are Flawed } ,
author = { Shanchuan Lin and Bingchen Liu and Jiashi Li and Xiao Yang } ,
year = { 2023 }
}
@inproceedings { Chung2024CFGMC ,
title = { CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models } ,
author = { Hyungjin Chung and Jeongsol Kim and Geon Yeong Park and Hyelin Nam and Jong Chul Ye } ,
year = { 2024 } ,
url = { https://api.semanticscholar.org/CorpusID:270391454 }
}
@inproceedings { Sadat2024EliminatingOA ,
title = { Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models } ,
author = { Seyedmorteza Sadat and Otmar Hilliges and Romann M. Weber } ,
year = { 2024 } ,
url = { https://api.semanticscholar.org/CorpusID:273098845 }
}