可控文本到图像生成数据集
Noah-Wukong Dataset
Zero:微调文本到图像的扩散模型以实现主题驱动的生成
Flickr 30k Dataset
Visual Genome Dataset
Conceptual Captions(CC) Dataset
YFCC100M Dataset
ALT200M Dataset
LAION-400M Dataset
LAION-5B Dataset
Wikipedia-based Image Text (WIT) Dataset 基于维基百科的图像文本 (WIT) 数据集
LAION-5B Dataset
TaiSu(太素--亿级大规模中文视觉语言预训练数据集)
COYO-700M:大规模图像文本对数据集
WIT:基于维基百科的图像文本数据集
DiffusionDB
# Get this repo
git clone https://github.com/nightrome/cocostuff.git
cd cocostuff
# Download everything
wget --directory-prefix=downloads http://images.cocodataset.org/zips/train2017.zip
wget --directory-prefix=downloads http://images.cocodataset.org/zips/val2017.zip
wget --directory-prefix=downloads http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/stuffthingmaps_trainval2017.zip
# Unpack everything
mkdir -p dataset/images
mkdir -p dataset/annotations
unzip downloads/train2017.zip -d dataset/images/
unzip downloads/val2017.zip -d dataset/images/
unzip downloads/stuffthingmaps_trainval2017.zip -d dataset/annotations/
1. 下载hfd
wget https://hf-mirror.com/hfd/hfd.sh
chmod a+x hfd.sh
2. 设置环境变量
export HF_ENDPOINT=https://hf-mirror.com
3.1 下载模型
./hfd.sh gpt2 --tool aria2c -x 4
3.2 下载数据集
./hfd.sh yuvalkirstain/pickapic_v1 --dataset --tool aria2c -x 4
DeepFashion-MultiModal
DeepFashion
COCO(COCO Captions) Dataset
CUBS-2000-2021 Dataset
102 Category Flower Dataset
Flickr8k_dataset
Flickr8k_Dataset.zip https://github.com/jbrownlee/Datasets/releases/download/Flickr8k/Flickr8k_Dataset.zip
Flickr8k_text.zip https://github.com/jbrownlee/Datasets/releases/download/Flickr8k/Flickr8k_text.zip
Nouns Dataset自动添加标题的名词数据集卡
OxfordTVG-HIC Dataset大规模幽默图像文本数据集
Multi-Modal-CelebA-HQ大规模人脸图像文本数据集