PixArt alpha下载 - PixArt alpha源码下载

原相 + 控制网络

？依赖关系和安装

Python >= 3.9 (推荐使用 Anaconda 或 Miniconda)
PyTorch >= 1.13.0+cu11.7

conda create -n pixart python=3.9
conda activate pixart
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118

git clone https://github.com/PixArt-alpha/PixArt-alpha.git
cd PixArt-alpha
pip install -r requirements.txt

⏬ 下载模型

所有模型都会自动下载。您也可以选择从此网址手动下载。

模型	#参数	网址	在 OpenXLab 中下载
T5	4.3B	T5	T5
VAE	80M	VAE	VAE
原相-α-SAM-256	0.6B	PixArt-XL-2-SAM-256x256.pth 或扩散器版本	256-萨姆
原相-α-256	0.6B	PixArt-XL-2-256x256.pth 或扩散器版本	256
原相-α-256-MSCOCO-FID7.32	0.6B	原相-XL-2-256x256.pth	256
原相-α-512	0.6B	PixArt-XL-2-512x512.pth 或扩散器版本	第512章
原相-α-1024	0.6B	PixArt-XL-2-1024-MS.pth 或扩散器版本	1024
原相-δ-1024-LCM	0.6B	扩散器版本
ControlNet-HED-编码器	30M	ControlNetHED.pth
原相-δ-512-ControlNet	0.9B	PixArt-XL-2-512-ControlNet.pth	第512章
原相-δ-1024-ControlNet	0.9B	PixArt-XL-2-1024-ControlNet.pth	1024

还可以在 OpenXLab_PixArt-alpha 中查找所有模型

如何训练

1. 原相培训

首先。

感谢@kopyl，您可以使用笔记本在 HugginFace 的 Pokemon 数据集上重现完整的微调训练流程：

使用notebooks/train.ipynb 进行训练。
使用notebooks/convert-checkpoint-to-diffusers.ipynb 转换为扩散器。
使用notebooks/infer.ipynb 使用步骤 2 中转换后的检查点运行推理。

那么，了解更多详情。

这里我们以 SAM 数据集训练配置为例，当然，您也可以按照此方法准备自己的数据集。

您只需要更改配置中的配置文件和数据集中的数据加载器。

python -m torch.distributed.launch --nproc_per_node=2 --master_port=12345 train_scripts/train.py configs/pixart_config/PixArt_xl2_img256_SAM.py --work-dir output/train_SAM_256

SAM数据集的目录结构为：

 cd ./data

SA1B
├──images/  (images are saved here)
│  ├──sa_xxxxx.jpg
│  ├──sa_xxxxx.jpg
│  ├──......
├──captions/    (corresponding captions are saved here, same name as images)
│  ├──sa_xxxxx.txt
│  ├──sa_xxxxx.txt
├──partition/   (all image names are stored txt file where each line is a image name)
│  ├──part0.txt
│  ├──part1.txt
│  ├──......
├──caption_feature_wmask/   (run tools/extract_caption_feature.py to generate caption T5 features, same name as images except .npz extension)
│  ├──sa_xxxxx.npz
│  ├──sa_xxxxx.npz
│  ├──......
├──img_vae_feature/  (run tools/extract_img_vae_feature.py to generate image VAE features, same name as images except .npy extension)
│  ├──train_vae_256/
│  │  ├──noflip/
│  │  │  ├──sa_xxxxx.npy
│  │  │  ├──sa_xxxxx.npy
│  │  │  ├──......

这里我们准备data_toy以便更好的理解

 cd ./data

git lfs install
git clone https://huggingface.co/datasets/PixArt-alpha/data_toy

然后，这是partition/part0.txt 文件的示例。

此外，对于 json 文件引导训练，这里有一个玩具 json 文件以便更好地理解。

2.原相+DreamBooth培训

遵循Pixart + DreamBooth培训指导

3.原相+LCM/LCM-LoRA培训

遵循PixArt + LCM培训指导

4. PixArt + ControlNet 培训

遵循PixArt + ControlNet培训指导

4.原相+LoRA培训

pip install peft==0.6.2

accelerate launch --num_processes=1 --main_process_port=36667  train_scripts/train_pixart_lora_hf.py --mixed_precision= " fp16 " 
  --pretrained_model_name_or_path=PixArt-alpha/PixArt-XL-2-1024-MS 
  --dataset_name=lambdalabs/pokemon-blip-captions --caption_column= " text " 
  --resolution=1024 --random_flip 
  --train_batch_size=16 
  --num_train_epochs=200 --checkpointing_steps=100 
  --learning_rate=1e-06 --lr_scheduler= " constant " --lr_warmup_steps=0 
  --seed=42 
  --output_dir= " pixart-pokemon-model " 
  --validation_prompt= " cute dragon creature " --report_to= " tensorboard " 
  --gradient_checkpointing --checkpoints_total_limit=10 --validation_epochs=5 
  --rank=16

如何测试

使用此存储库进行推理需要至少23GB GPU 内存，而在 ? 中使用则需要11GB and 8GB 。扩散器。

目前支持：

IDDPM
DPM 求解器
SA求解器
DPM-求解器-v3

1.Gradio快速入门

首先，首先安装所需的依赖项。确保您已将模型下载到 output/pretrained_models 文件夹，然后在本地计算机上运行：

DEMO_PORT=12345 python app/app.py

作为替代方案，提供了一个示例 Dockerfile 来创建启动 Gradio 应用程序的运行时容器。

docker build . -t pixart
docker run --gpus all -it -p 12345:12345 -v < path_to_huggingface_cache > :/root/.cache/huggingface pixart

或者使用 docker-compose。请注意，如果您想将应用程序的上下文从 1024 更改为 512 或 LCM 版本，只需更改 docker-compose.yml 文件中的 APP_CONTEXT 环境变量即可。默认为 1024

docker compose build
docker compose up

让我们看一个使用http://your-server-ip:12345的简单示例。

2. 集成在扩散器中

1).使用在？扩散器

确保您拥有以下库的更新版本：

pip install -U transformers accelerate diffusers SentencePiece ftfy beautifulsoup4

进而：

 import torch
from diffusers import PixArtAlphaPipeline , ConsistencyDecoderVAE , AutoencoderKL
device = torch . device ( "cuda:0" if torch . cuda . is_available () else "cpu" )

# You can replace the checkpoint id with "PixArt-alpha/PixArt-XL-2-512x512" too.
pipe = PixArtAlphaPipeline . from_pretrained ( "PixArt-alpha/PixArt-XL-2-1024-MS" , torch_dtype = torch . float16 , use_safetensors = True )

# If use DALL-E 3 Consistency Decoder
# pipe.vae = ConsistencyDecoderVAE.from_pretrained("openai/consistency-decoder", torch_dtype=torch.float16)

# If use SA-Solver sampler
# from diffusion.sa_solver_diffusers import SASolverScheduler
# pipe.scheduler = SASolverScheduler.from_config(pipe.scheduler.config, algorithm_type='data_prediction')

# If loading a LoRA model
# transformer = Transformer2DModel.from_pretrained("PixArt-alpha/PixArt-LCM-XL-2-1024-MS", subfolder="transformer", torch_dtype=torch.float16)
# transformer = PeftModel.from_pretrained(transformer, "Your-LoRA-Model-Path")
# pipe = PixArtAlphaPipeline.from_pretrained("PixArt-alpha/PixArt-LCM-XL-2-1024-MS", transformer=transformer, torch_dtype=torch.float16, use_safetensors=True)
# del transformer

# Enable memory optimizations.
# pipe.enable_model_cpu_offload()

pipe . to ( device )

prompt = "A small cactus with a happy face in the Sahara desert."
image = pipe ( prompt ). images [ 0 ]
image . save ( "./catcus.png" )

查看文档以获取有关 SA-Solver Sampler 的更多信息。

这种集成允许在 11 GB GPU VRAM 下运行批量大小为 4 的管道。查看文档以了解更多信息。

2）。在 8GB GPU VRAM 下运行`PixArtAlphaPipeline`

现在支持 8 GB 以下的 GPU VRAM 消耗，请参阅文档了解更多信息。

3）。带扩散器的渐变台（更快）

首先，首先安装所需的依赖项，然后在本地计算机上运行：

 # diffusers version
DEMO_PORT=12345 python app/app.py

让我们看一个使用http://your-server-ip:12345的简单示例。

您还可以点击此处在 Google Colab 上免费试用。

4).将 .pth 检查点转换为扩散器版本

python tools/convert_pixart_alpha_to_diffusers.py --image_size your_img_size --multi_scale_train (True if you use PixArtMS else False) --orig_ckpt_path path/to/pth --dump_path path/to/diffusers --only_transformer=True

3. 在线演示

在线演示样本

✏️ 如何为 LLaVA 添加字幕

借助 LLaVA-Lightning-MPT 的代码库，我们可以使用以下启动代码为 LAION 和 SAM 数据集添加标题：

python tools/VLM_caption_lightning.py --output output/dir/ --data-root data/root/path --index path/to/data.json

我们为 LAION（左）和 SAM（右）提供带有自定义提示的自动标记。绿色突出显示的单词代表 LAION 中的原始字幕，而红色标记的单词表示 LLaVA 标记的详细字幕。

与 LLaVA 对话。

✏️如何提取T5和VAE特征

提前准备T5文本特征和VAE图像特征将加快训练过程并节省GPU内存。

python tools/extract_features.py --img_size=1024 
    --json_path " data/data_info.json " 
    --t5_save_root " data/SA1B/caption_feature_wmask " 
    --vae_save_root " data/SA1B/img_vae_features " 
    --pretrained_models_dir " output/pretrained_models " 
    --dataset_root " data/SA1B/Images/ "

?待办事项清单（恭喜？）

其他来源

我们制作了一个视频，将 PixArt 与当前最强大的文本到图像模型进行比较。

书目词典

 @misc{chen2023pixartalpha,
      title={PixArt-$alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis}, 
      author={Junsong Chen and Jincheng Yu and Chongjian Ge and Lewei Yao and Enze Xie and Yue Wu and Zhongdao Wang and James Kwok and Ping Luo and Huchuan Lu and Zhenguo Li},
      year={2023},
      eprint={2310.00426},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
@misc{chen2024pixartdelta,
      title={PIXART-{delta}: Fast and Controllable Image Generation with Latent Consistency Models}, 
      author={Junsong Chen and Yue Wu and Simian Luo and Enze Xie and Sayak Paul and Ping Luo and Hang Zhao and Zhenguo Li},
      year={2024},
      eprint={2401.05252},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

?致谢

感谢 Diffusers 出色的技术支持和出色的合作！
感谢 Hugging Face 赞助这次精彩的演示！
感谢 DiT 的出色工作和代码库！

PixArt alpha

？依赖关系和安装

⏬ 下载模型

如何训练

1. 原相培训

2.原相+DreamBooth培训

3.原相+LCM/LCM-LoRA培训

4. PixArt + ControlNet 培训

4.原相+LoRA培训

如何测试

1.Gradio快速入门

2. 集成在扩散器中

1).使用在？扩散器

2）。在 8GB GPU VRAM 下运行`PixArtAlphaPipeline`

3）。带扩散器的渐变台（更快）

4).将 .pth 检查点转换为扩散器版本

3. 在线演示

✏️ 如何为 LLaVA 添加字幕

✏️如何提取T5和VAE特征

?待办事项清单（恭喜？）

其他来源

书目词典

?致谢

明星历史

阿尔法协议

乐高阿尔法团队

阿尔法001

核心阿尔法

阿尔法鲨

DzzOffice Alpha

chat.petals.dev

GPT Prompt Templates

GPTyped

waymo open dataset

Sunamu

MySchedule.py

waymo open dataset

termwind

wp functions

PixArt alpha

？依赖关系和安装

⏬ 下载模型

如何训练

1. 原相培训

2.原相+DreamBooth培训

3.原相+LCM/LCM-LoRA培训

4. PixArt + ControlNet 培训

4.原相+LoRA培训

如何测试

1.Gradio快速入门

2. 集成在扩散器中

1).使用在？扩散器

2）。在 8GB GPU VRAM 下运行PixArtAlphaPipeline

3）。带扩散器的渐变台（更快）

4).将 .pth 检查点转换为扩散器版本

3. 在线演示

✏️ 如何为 LLaVA 添加字幕

✏️如何提取T5和VAE特征

?待办事项清单（恭喜？）

其他来源

书目词典

?致谢

明星历史

2）。在 8GB GPU VRAM 下运行`PixArtAlphaPipeline`