实施 GigaGAN(项目页面),Adobe 推出的新 SOTA GAN。
我还将添加一些轻量级 gan 的发现,以实现更快的收敛(跳过层激励)和更好的稳定性(鉴别器中的重建辅助损失)
它还包含 1k - 4k 上采样器的代码,我认为这是本文的亮点。
如果您有兴趣帮助 LAION 社区进行复制,请加入
稳定性人工智能和?感谢慷慨的赞助,以及我的其他赞助商,感谢他们为我提供了开源人工智能的独立性。
? Huggingface 的加速库
OpenClip 的所有维护者,感谢他们的 SOTA 开源对比学习文本图像模型
Xavier 非常有帮助的代码审查,以及关于如何构建鉴别器中的尺度不变性的讨论!
@CerebralSeed 请求生成器和上采样器的初始采样代码!
Keerth 进行了代码审查并指出了与论文的一些差异!
$ pip install gigagan-pytorch
简单的无条件 GAN,适合初学者
import torch
from gigagan_pytorch import (
GigaGAN ,
ImageDataset
)
gan = GigaGAN (
generator = dict (
dim_capacity = 8 ,
style_network = dict (
dim = 64 ,
depth = 4
),
image_size = 256 ,
dim_max = 512 ,
num_skip_layers_excite = 4 ,
unconditional = True
),
discriminator = dict (
dim_capacity = 16 ,
dim_max = 512 ,
image_size = 256 ,
num_skip_layers_excite = 4 ,
unconditional = True
),
amp = True
). cuda ()
# dataset
dataset = ImageDataset (
folder = '/path/to/your/data' ,
image_size = 256
)
dataloader = dataset . get_dataloader ( batch_size = 1 )
# you must then set the dataloader for the GAN before training
gan . set_dataloader ( dataloader )
# training the discriminator and generator alternating
# for 100 steps in this example, batch size 1, gradient accumulated 8 times
gan (
steps = 100 ,
grad_accum_every = 8
)
# after much training
images = gan . generate ( batch_size = 4 ) # (4, 3, 256, 256)
对于无条件 Unet 上采样器
import torch
from gigagan_pytorch import (
GigaGAN ,
ImageDataset
)
gan = GigaGAN (
train_upsampler = True , # set this to True
generator = dict (
style_network = dict (
dim = 64 ,
depth = 4
),
dim = 32 ,
image_size = 256 ,
input_image_size = 64 ,
unconditional = True
),
discriminator = dict (
dim_capacity = 16 ,
dim_max = 512 ,
image_size = 256 ,
num_skip_layers_excite = 4 ,
multiscale_input_resolutions = ( 128 ,),
unconditional = True
),
amp = True
). cuda ()
dataset = ImageDataset (
folder = '/path/to/your/data' ,
image_size = 256
)
dataloader = dataset . get_dataloader ( batch_size = 1 )
gan . set_dataloader ( dataloader )
# training the discriminator and generator alternating
# for 100 steps in this example, batch size 1, gradient accumulated 8 times
gan (
steps = 100 ,
grad_accum_every = 8
)
# after much training
lowres = torch . randn ( 1 , 3 , 64 , 64 ). cuda ()
images = gan . generate ( lowres ) # (1, 3, 256, 256)
G
- 发电机MSG
- 多尺度发生器D
- 鉴别器MSD
- 多尺度鉴别器GP
梯度罚分SSL
- 判别器中的辅助重建(来自轻量级 GAN)VD
- 视觉辅助鉴别器VG
- 视觉辅助发生器CL
- 生成器对比损失MAL
- 匹配感知损失健康的运行将使G
、 MSG
、 D
、 MSD
值徘徊在0
到10
之间,并且通常保持相当恒定。如果在 1k 训练步骤之后的任何时候这些值持续保持在三位数,则意味着出现了问题。生成器和鉴别器的值偶尔下降为负值是可以的,但它应该回升至上述范围。
GP
和SSL
应推向0
。 GP
偶尔会出现峰值;我喜欢将其想象为网络正在经历某种顿悟
GigaGAN
级现在配备了?加速器。您可以使用其accelerate
CLI 轻松地通过两步进行多 GPU 训练
在训练脚本所在的项目根目录中,运行
$ accelerate config
然后在同一个目录下
$ accelerate launch train . py
确保可以无条件训练
阅读相关论文并消除所有 3 个辅助损失
unet上采样器
对多尺度输入和输出进行代码审查,因为论文有点模糊
添加上采样网络架构
使基础生成器和上采样器无条件工作
使文本条件训练适用于基础和上采样器
通过随机采样补丁提高侦察效率
确保生成器和鉴别器也可以接受预编码的 CLIP 文本编码
审查辅助损失
添加一些可微的增强,这是旧 GAN 时代经过验证的技术
将所有调制投影移至自适应 conv2d 类中
添加加速
剪辑对于所有模块来说都是可选的,并由GigaGAN
管理,文本 -> 文本嵌入处理一次
添加从多尺度维度选择随机子集的能力,以提高效率
从轻量级|stylegan2-pytorch 通过 CLI 移植
连接文本图像的 laion 数据集
@misc { https://doi.org/10.48550/arxiv.2303.05511 ,
url = { https://arxiv.org/abs/2303.05511 } ,
author = { Kang, Minguk and Zhu, Jun-Yan and Zhang, Richard and Park, Jaesik and Shechtman, Eli and Paris, Sylvain and Park, Taesung } ,
title = { Scaling up GANs for Text-to-Image Synthesis } ,
publisher = { arXiv } ,
year = { 2023 } ,
copyright = { arXiv.org perpetual, non-exclusive license }
}
@article { Liu2021TowardsFA ,
title = { Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis } ,
author = { Bingchen Liu and Yizhe Zhu and Kunpeng Song and A. Elgammal } ,
journal = { ArXiv } ,
year = { 2021 } ,
volume = { abs/2101.04775 }
}
@inproceedings { dao2022flashattention ,
title = { Flash{A}ttention: Fast and Memory-Efficient Exact Attention with {IO}-Awareness } ,
author = { Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R{'e}, Christopher } ,
booktitle = { Advances in Neural Information Processing Systems } ,
year = { 2022 }
}
@inproceedings { Karras2020ada ,
title = { Training Generative Adversarial Networks with Limited Data } ,
author = { Tero Karras and Miika Aittala and Janne Hellsten and Samuli Laine and Jaakko Lehtinen and Timo Aila } ,
booktitle = { Proc. NeurIPS } ,
year = { 2020 }
}
@article { Xu2024VideoGigaGANTD ,
title = { VideoGigaGAN: Towards Detail-rich Video Super-Resolution } ,
author = { Yiran Xu and Taesung Park and Richard Zhang and Yang Zhou and Eli Shechtman and Feng Liu and Jia-Bin Huang and Difan Liu } ,
journal = { ArXiv } ,
year = { 2024 } ,
volume = { abs/2404.12388 } ,
url = { https://api.semanticscholar.org/CorpusID:269214195 }
}