IT3D text to 3D下載 - IT3D text to 3D原始碼下載

IT3D text to 3D

其他源碼

1.0.0

下載

IT3D 的官方儲存庫 (AAAI 2024)

IT3D：透過明確視圖合成改進文本到 3D 的生成 (AAAI 2024) 。

陳一文、張馳、楊曉峰、蔡忠剛、餘剛、楊雷、林國勝

Arxiv

抽象的

從強大的大型文字到影像擴散模型 (LDM) 中提取知識，推動了文字到 3D 技術的最新進展。儘管如此，現有的文本轉 3D 方法經常面臨諸如過飽和、細節不足和輸出不切實際等挑戰。這項研究提出了一種新穎的策略，利用顯式合成的多視圖影像來解決這些問題。我們的方法涉及利用 LDM 支援的影像到影像管道，根據粗略 3D 模型的渲染生成高品質的影像。儘管生成的圖像在很大程度上緩解了上述問題，但由於大型擴散模型固有的生成性質，諸如視圖不一致和顯著內容差異等挑戰仍然存在，這給有效利用這些圖像帶來了巨大的困難。為了克服這一障礙，我們主張將鑑別器與新穎的 Diffusion-GAN 雙重訓練策略相結合，以指導 3D 模型的訓練。對於合併的鑑別器，合成的多視圖影像被視為真實數據，而最佳化的 3D 模型的渲染則被視為假數據。我們進行了一系列全面的實驗，證明我們的方法相對於基線方法的有效性。

IT3D_演示.mp4

示範

更多視頻

左：粗略模型（基線）。右：精緻模型（我們的）。檔案名稱：提示

白色茉莉花束.mp4

死侍3D模型.mp4

一束黃菊花.mp4

達斯維達 3D 模型.mp4

浩克.mp4

一束粉紅菊花.mp4

蝙蝠俠3D模型.mp4

鋼鐵人3D模型.mp4

一束黃玫瑰.mp4

薩諾斯的大理石半身像.mp4

一盤新鮮西蘭花.mp4

紅浩克3D模型.mp4

安裝

git clone https://github.com/buaacyw/IT3D-text-to-3D.git
cd IT3D-text-to-3D
conda create -n it3d python==3.8
conda activate it3d
pip install -r requirements.txt
pip install ./raymarching
pip install ./shencoder
pip install ./freqencoder
pip install ./gridencoder

下載圖像到圖像模型（可選）

對於影像到影像的管道，我們實作了 Stadiffusion Image2Image 和 ControlNetv1.1。

在我們的實驗中，Controlnet 總是提供更好的結果。如果您想使用 Controlnet 作為圖像到圖像管道，您需要按照 ControlNetv1.1 的說明從此處下載模型。

例如，如果您想在 softedge 上使用 Controlnet，則需要下載control_v11p_sd15_softedge.yaml和control_v11p_sd15_softedge.pth並將它們放在資料夾ctn_models中。此外，您需要下載 Stable Diffusion 1.5 模型v1-5-pruned.ckpt並將其放入資料夾ctn_models中。

測試環境

A6000 上運行 Ubuntu 22，配備 torch 2.0.1 和 CUDA 11.7。

OOM 提示

我們所有的演示（粗略模型和精細模型）都是在 512 分辨率下進行訓練的。在 512 解析度下，訓練粗略模型（vanilla Stable Dreamfusion）大約需要 30G，使用 IT3D 對其進行細化需要 35G。您可以透過以下方式降低記憶體消耗：

透過設定--h和--w降低訓練解析度。雖然這會顯著減少記憶體使用量，但也會導致效能大幅下降。 64解析度的IT3D大約需要10G。
透過設定--nerf l1使用輕量級 NeRF。我們的預設是--nerf l2 。
透過設定--max_steps降低每條射線的採樣步數。我們的預設是--max_steps 384
如果在 Controlnet 資料產生期間出現 OOM，請降低--ctn_sample_batch_size 。

表現技巧

透過設定--text和--seed更改提示和種子。遺憾的是，訓練一個沒有 Janus 問題的粗略模型通常需要多次嘗試。
透過設定--latent_iter_ratio 0.1在粗略模型訓練的早期階段將 NeRF 渲染為潛在特徵。
更改判別損失--g_loss_weight 。當產生的資料集種類過多時，需要降低--g_loss_weight 。您可以放大--g_loss_weight以獲得高品質資料集。
調整 GAN 的時間更長會提高品質。更改--g_loss_decay_begin_step和--g_loss_decay_step 。在我們的預設設定中，我們將 GAN 調整為 7500 步，然後丟棄它。

下載粗略模型檢查點

我們發布了粗略模型檢查點。解壓縮到資料夾ckpts中。所有這些檢查點都在我們預設的粗略模型設定中進行訓練。

用法

在我們的 A6000 上，使用 SD-I2I 產生包含 640 張影像的資料集分別需要 6 分鐘，使用 Controlnet 需要 25 分鐘。

 # # Refine a coarse NeRF
# --no_cam_D: camera free discriminator, camera pose won't be input to discriminator
# --g_loss_decay_begin_step: when to decay the weight of discrimination loss
# --real_save_path: path to generated dataset

# Jasmine
python main.py -O --text " a bunch of white jasmine " --workspace jas_ctn --ckpt ckpts/jas_df_ep0200.pth --no_cam_D --gan --ctn --g_loss_decay_begin_step 25000 --real_save_path generated_dataset/jas_ctn

# Use stable diffusion img2img pipeline instead of Controlnet
python main.py -O --text " a bunch of white jasmine " --workspace jas_sd --ckpt ckpts/jas_df_ep0200.pth --no_cam_D --gan  --g_loss_decay_begin_step 25000 --real_save_path generated_dataset/jas_sd

# Iron Man
python main.py -O --text " a 3D model of an iron man, highly detailed, full body " --workspace iron_ctn --ckpt ckpts/iron_man_df_ep0400.pth --no_cam_D --gan --ctn --g_loss_decay_begin_step 45000 --real_save_path generated_dataset/iron_ctn

# Darth Vader
python main.py -O --text " Full-body 3D model of Darth Vader, highly detailed " --workspace darth_ctn --ckpt ckpts/darth_df_ep0200.pth --no_cam_D --gan --ctn --g_loss_decay_begin_step 25000 --real_save_path generated_dataset/darth_ctn

# Hulk
python main.py -O --text " 3D model of hulk, highly detailed " --workspace hulk_ctn --ckpt ckpts/hulk_df_ep0200.pth --no_cam_D --gan --ctn  --g_loss_decay_begin_step 25000 --real_save_path generated_dataset/hulk_ctn

# Ablation Experiment in Paper
# Note: our default setting is sds loss + decayed gan loss. gan loss weight will be decayed to zero after 7500 steps (depending on g_loss_decay_begin_step)
# only l2 loss
python main.py -O --text " 3D model of hulk, highly detailed " --workspace hulk_ctn_l2 --ckpt ckpts/hulk_df_ep0200.pth --no_cam_D --gan --ctn --l2_weight 100.0 --l2_decay_begin_step 25000 --l2_decay_step 2500 --l2_weight_end 0.0 --sds_weight_end 0.0 --g_loss_decay_begin_step 0 --real_save_path generated_dataset/hulk_ctn

# l2 loss + sds loss
python main.py -O --text " 3D model of hulk, highly detailed " --workspace hulk_ctn_l2_sds --ckpt ckpts/hulk_df_ep0200.pth --no_cam_D --gan --ctn --l2_weight 100.0 --l2_decay_begin_step 25000 --l2_decay_step 2500 --l2_weight_end 0.0  --g_loss_decay_begin_step 0 --real_save_path generated_dataset/hulk_ctn

# only GAN
python main.py -O --text " 3D model of hulk, highly detailed " --workspace hulk_ctn_only_gan --ckpt ckpts/hulk_df_ep0200.pth --no_cam_D --gan --ctn --sds_weight_end 0.0 --real_save_path generated_dataset/hulk_ctn

# Edit to red Hulk, change --text
python main.py -O --text " a red hulk, red skin, highly detailed " --workspace hulk_red_ctn --ckpt ckpts/hulk_df_ep0200.pth --no_cam_D --gan --ctn  --g_loss_decay_begin_step 25000 --real_save_path generated_dataset/hulk_ctn

# # Generate Dataset and DMTET Mesh
# generate dataset
python main.py -O --text " a bunch of blue rose, highly detailed " --workspace rose_blue_ctn --ckpt ckpts/rose_df_ep0200.pth  --gan --ctn --no_cam_D --iters 0 --real_save_path generated_dataset/rose_blue_ctn 
# DMTET Mesh
python main.py -O --text " a bunch of blue rose, highly detailed " --workspace rose_blue_ctn_dm  --gan --ctn --no_cam_D  --g_loss_decay_begin_step 5000 --g_loss_decay_step 5000  --init_with ckpts/rose_df_ep0200.pth --dmtet --init_color --real_save_path generated_dataset/rose_blue_ctn


# # Train your own coarse NeRF
python main.py -O --text " a bunch of white jasmine " --workspace jas
# Refine it
python main.py -O --text " a bunch of white jasmine " --workspace jas_ctn --ckpt jas/checkpoints/df_ep0200.pth --no_cam_D --gan --ctn --g_loss_decay_begin_step 25000 --real_save_path generated_dataset/jas_ctn

您可能需要更改的超參數：

--real_overwrite: 開啟它覆蓋真實資料集目錄
--per_view_gt：每個攝影機檢視將產生多少影像。預設值：5
--img2img_view_num：img2img 產生有多少相機視圖。預設值：64。
--gan：合併判別器（IT3D）
--ctn：在軟邊上使用 ControlNet 條件。如果為 false，則將使用 StableDiffusion 影像到影像管道。 SD I2I 速度較快，但品質較低。
--深度：深度調節的控製網絡
--noraml: 正常條件的 Controlnet
--strength：Controlnet 調整的強度
--init_color: 是否初始化DMTET的顏色。有時您必須開啟此選項以避免此錯誤。

致謝

我們的程式碼是基於這些精彩的儲存庫：

穩定夢融合

 @misc{stable-dreamfusion,
    Author = {Jiaxiang Tang},
    Year = {2022},
    Note = {https://github.com/ashawkey/stable-dreamfusion},
    Title = {Stable-dreamfusion: Text-to-3D with Stable-diffusion}
}

EG3D

 @inproceedings{Chan2022,
  author = {Eric R. Chan and Connor Z. Lin and Matthew A. Chan and Koki Nagano and Boxiao Pan and Shalini De Mello and Orazio         Gallo and Leonidas Guibas and Jonathan Tremblay and Sameh Khamis and Tero Karras and Gordon Wetzstein},
  title = {Efficient Geometry-aware {3D} Generative Adversarial Networks},
  booktitle = {CVPR},
  year = {2022}
}

控制網

 @misc{zhang2023adding,
  title={Adding Conditional Control to Text-to-Image Diffusion Models}, 
  author={Lvmin Zhang and Maneesh Agrawala},
  year={2023},
  eprint={2302.05543},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

引文

如果您發現這項工作有用，將不勝感激，透過以下方式引用：

  @misc{chen2023it3d,
        title={IT3D: Improved Text-to-3D Generation with Explicit View Synthesis}, 
        author={Yiwen Chen and Chi Zhang and Xiaofeng Yang and Zhongang Cai and Gang Yu and Lei Yang and Guosheng Lin},
        year={2023},
        eprint={2308.11473},
        archivePrefix={arXiv},
        primaryClass={cs.CV}
  }

展開

附加信息