IT3D text to 3D 다운로드 - IT3D text to 3D 다운로드

IT3D text to 3D

기타 소스코드

1.0.0

다운로드

IT3D 공식 저장소(AAAI 2024)

IT3D: 명시적 뷰 합성을 통해 향상된 텍스트-3D 생성(AAAI 2024) .

Yiwen Chen, Chi Zhang, Xiaofeng Yang, Zhongang Cai, Gang Yu, Lei Yang, Guosheng Lin

아르크시브

추상적인

텍스트-3D 기술의 최근 발전은 강력한 대규모 텍스트-이미지 확산 모델(LDM)에서 얻은 지식을 바탕으로 추진되었습니다. 그럼에도 불구하고 기존의 Text-to-3D 접근 방식은 과포화, 부적절한 세부 묘사, 비현실적인 출력과 같은 문제로 어려움을 겪는 경우가 많습니다. 본 연구는 이러한 문제를 해결하기 위해 명시적으로 합성된 다중 시점 이미지를 활용하는 새로운 전략을 제시합니다. 우리의 접근 방식에는 LDM을 통해 강화된 이미지 간 파이프라인을 활용하여 거친 3D 모델의 렌더링을 기반으로 포즈가 잡힌 고품질 이미지를 생성하는 것이 포함됩니다. 생성된 이미지는 앞서 언급한 문제를 대부분 완화하지만, 대규모 확산 모델의 고유한 생성 특성으로 인해 뷰 불일치 및 상당한 콘텐츠 차이와 같은 문제가 지속되어 이러한 이미지를 효과적으로 활용하는 데 큰 어려움을 겪습니다. 이러한 장애물을 극복하기 위해 우리는 3D 모델 훈련을 안내하기 위해 새로운 Diffusion-GAN 이중 훈련 전략과 함께 판별자를 통합하는 것을 옹호합니다. 통합된 판별기의 경우 합성된 다시점 이미지는 실제 데이터로 간주되고, 최적화된 3D 모델의 렌더링은 가짜 데이터로 기능합니다. 우리는 기본 접근 방식에 비해 우리 방법의 효율성을 입증하는 포괄적인 일련의 실험을 수행합니다.

IT3D_demo.mp4

시민

더 많은 동영상

왼쪽: 대략적인 모델(기준선). 오른쪽: 세련된 모델(우리의 것). 파일 이름: 프롬프트

a.bunch.of.white.jasmin.mp4

3D.모델.of.데드풀.mp4

a.bunch.of.yellow.국화.mp4

3D.모델.of.Darth.Vader.mp4

헐크.mp4

a.bunch.of.pink.Chrysanthemum.mp4

3D.모델.of.배트맨.mp4

a.3D.model.of.an.iron.man.mp4

a.bunch.of.yellow.rose.mp4

a.marble.bust.of.Thanos.mp4

a.plate.of.fresh.broccoli.mp4

3D.모델.of.red.hulk.mp4

설치하다

git clone https://github.com/buaacyw/IT3D-text-to-3D.git
cd IT3D-text-to-3D
conda create -n it3d python==3.8
conda activate it3d
pip install -r requirements.txt
pip install ./raymarching
pip install ./shencoder
pip install ./freqencoder
pip install ./gridencoder

완드 로그인

wandb 계정이 없으면 등록해야 합니다.

wandb login

이미지-이미지 모델 다운로드(선택 사항)

이미지-이미지 파이프라인을 위해 Stadiffusion Image2Image 및 ControlNetv1.1을 구현했습니다.

우리 실험에서 Controlnet은 항상 더 나은 결과를 제공합니다. Controlnet을 이미지 간 파이프라인으로 사용하려면 ControlNetv1.1의 지침에 따라 여기에서 모델을 다운로드해야 합니다.

예를 들어, Softedge를 기반으로 하는 Controlnet을 사용하려면 control_v11p_sd15_softedge.yaml 및 control_v11p_sd15_softedge.pth 다운로드하여 ctn_models 폴더에 넣어야 합니다. 또한 Stable Diffusion 1.5 모델 v1-5-pruned.ckpt 를 다운로드하여 ctn_models 폴더에 넣어야 합니다.

테스트된 환경

A6000에 토치 2.0.1 및 CUDA 11.7이 포함된 Ubuntu 22.

OOM에 대한 팁

우리의 모든 데모(거친 모델과 정밀한 모델)는 512 해상도로 훈련되었습니다. 512 해상도에서는 거친 모델(바닐라 Stable Dreamfusion)을 훈련하는 데 약 30G가 필요하고 IT3D로 정제하는 데는 35G가 필요합니다. 다음을 통해 메모리 소비를 낮출 수 있습니다.

--h 및 --w 설정하여 훈련 해상도를 낮춥니다. 이렇게 하면 메모리 사용량이 크게 줄어들지만 성능도 크게 저하됩니다. 64 해상도에서 IT3D의 경우 약 10G가 필요합니다.
--nerf l1 설정하여 경량 NeRF를 사용하세요. 기본 설정은 --nerf l2 입니다.
--max_steps 설정하여 광선당 샘플링 단계를 낮춥니다. 기본 설정은 --max_steps 384 입니다.
Controlnet 데이터 생성 중에 OOM하는 경우 --ctn_sample_batch_size 낮추세요.

성능을 위한 팁

--text 및 --seed 설정하여 프롬프트와 시드를 변경합니다. 안타깝게도 야누스 문제가 없는 거친 모델을 훈련하려면 여러 번 시도해야 하는 경우가 많습니다.
--latent_iter_ratio 0.1 설정하여 대략적인 모델 훈련의 초기 단계에서 NeRF를 잠재 기능으로 렌더링합니다.
차별 손실 --g_loss_weight 변경합니다. 생성된 데이터 세트가 너무 다양하면 --g_loss_weight 낮춰야 합니다. 고품질 데이터 세트의 경우 --g_loss_weight 확대할 수 있습니다.
GAN을 더 오래 조정하면 품질이 향상됩니다. --g_loss_decay_begin_step 및 --g_loss_decay_step 변경하세요. 기본 설정에서는 GAN을 7500단계로 조정한 다음 폐기합니다.

대략적인 모델 체크포인트 다운로드

거친 모델 체크포인트를 해제합니다. ckpts 폴더에 압축을 푼다. 이러한 모든 체크포인트는 기본 거친 모델 설정에서 학습됩니다.

용법

A6000에서는 SD-I2I를 사용하여 640개의 이미지 데이터 세트를 생성하는 데 각각 6분, Controlnet을 사용하여 25분이 소요됩니다.

 # # Refine a coarse NeRF
# --no_cam_D: camera free discriminator, camera pose won't be input to discriminator
# --g_loss_decay_begin_step: when to decay the weight of discrimination loss
# --real_save_path: path to generated dataset

# Jasmine
python main.py -O --text " a bunch of white jasmine " --workspace jas_ctn --ckpt ckpts/jas_df_ep0200.pth --no_cam_D --gan --ctn --g_loss_decay_begin_step 25000 --real_save_path generated_dataset/jas_ctn

# Use stable diffusion img2img pipeline instead of Controlnet
python main.py -O --text " a bunch of white jasmine " --workspace jas_sd --ckpt ckpts/jas_df_ep0200.pth --no_cam_D --gan  --g_loss_decay_begin_step 25000 --real_save_path generated_dataset/jas_sd

# Iron Man
python main.py -O --text " a 3D model of an iron man, highly detailed, full body " --workspace iron_ctn --ckpt ckpts/iron_man_df_ep0400.pth --no_cam_D --gan --ctn --g_loss_decay_begin_step 45000 --real_save_path generated_dataset/iron_ctn

# Darth Vader
python main.py -O --text " Full-body 3D model of Darth Vader, highly detailed " --workspace darth_ctn --ckpt ckpts/darth_df_ep0200.pth --no_cam_D --gan --ctn --g_loss_decay_begin_step 25000 --real_save_path generated_dataset/darth_ctn

# Hulk
python main.py -O --text " 3D model of hulk, highly detailed " --workspace hulk_ctn --ckpt ckpts/hulk_df_ep0200.pth --no_cam_D --gan --ctn  --g_loss_decay_begin_step 25000 --real_save_path generated_dataset/hulk_ctn

# Ablation Experiment in Paper
# Note: our default setting is sds loss + decayed gan loss. gan loss weight will be decayed to zero after 7500 steps (depending on g_loss_decay_begin_step)
# only l2 loss
python main.py -O --text " 3D model of hulk, highly detailed " --workspace hulk_ctn_l2 --ckpt ckpts/hulk_df_ep0200.pth --no_cam_D --gan --ctn --l2_weight 100.0 --l2_decay_begin_step 25000 --l2_decay_step 2500 --l2_weight_end 0.0 --sds_weight_end 0.0 --g_loss_decay_begin_step 0 --real_save_path generated_dataset/hulk_ctn

# l2 loss + sds loss
python main.py -O --text " 3D model of hulk, highly detailed " --workspace hulk_ctn_l2_sds --ckpt ckpts/hulk_df_ep0200.pth --no_cam_D --gan --ctn --l2_weight 100.0 --l2_decay_begin_step 25000 --l2_decay_step 2500 --l2_weight_end 0.0  --g_loss_decay_begin_step 0 --real_save_path generated_dataset/hulk_ctn

# only GAN
python main.py -O --text " 3D model of hulk, highly detailed " --workspace hulk_ctn_only_gan --ckpt ckpts/hulk_df_ep0200.pth --no_cam_D --gan --ctn --sds_weight_end 0.0 --real_save_path generated_dataset/hulk_ctn

# Edit to red Hulk, change --text
python main.py -O --text " a red hulk, red skin, highly detailed " --workspace hulk_red_ctn --ckpt ckpts/hulk_df_ep0200.pth --no_cam_D --gan --ctn  --g_loss_decay_begin_step 25000 --real_save_path generated_dataset/hulk_ctn

# # Generate Dataset and DMTET Mesh
# generate dataset
python main.py -O --text " a bunch of blue rose, highly detailed " --workspace rose_blue_ctn --ckpt ckpts/rose_df_ep0200.pth  --gan --ctn --no_cam_D --iters 0 --real_save_path generated_dataset/rose_blue_ctn 
# DMTET Mesh
python main.py -O --text " a bunch of blue rose, highly detailed " --workspace rose_blue_ctn_dm  --gan --ctn --no_cam_D  --g_loss_decay_begin_step 5000 --g_loss_decay_step 5000  --init_with ckpts/rose_df_ep0200.pth --dmtet --init_color --real_save_path generated_dataset/rose_blue_ctn


# # Train your own coarse NeRF
python main.py -O --text " a bunch of white jasmine " --workspace jas
# Refine it
python main.py -O --text " a bunch of white jasmine " --workspace jas_ctn --ckpt jas/checkpoints/df_ep0200.pth --no_cam_D --gan --ctn --g_loss_decay_begin_step 25000 --real_save_path generated_dataset/jas_ctn

변경해야 할 가능한 하이퍼파라미터는 다음과 같습니다.

--real_overwrite: 실제 데이터 세트 디렉터리를 덮어쓰려면 엽니다.
--per_view_gt: 각 카메라 뷰에 대해 생성되는 이미지 수입니다. 기본값: 5
--img2img_view_num: img2img 생성을 위한 카메라 뷰 수입니다. 기본값: 64.
--gan: 판별기 통합(IT3D)
--ctn: 소프트에지에서 ControlNet 조건을 사용합니다. false인 경우 StableDiffusion Image-to-Image 파이프라인이 사용됩니다. SD I2I는 훨씬 빠르지만 품질이 낮습니다.
--깊이: 깊이 조절된 Controlnet
--noraml: 정상 조건의 Controlnet
--strength: Controlnet 컨디셔닝의 강도
--init_color: DMTET의 색상을 초기화할지 여부입니다. 때때로 이 버그를 피하기 위해 이 옵션을 열어야 합니다.

승인

우리 코드는 다음과 같은 훌륭한 저장소를 기반으로 합니다.

안정-드림퓨전

 @misc{stable-dreamfusion,
    Author = {Jiaxiang Tang},
    Year = {2022},
    Note = {https://github.com/ashawkey/stable-dreamfusion},
    Title = {Stable-dreamfusion: Text-to-3D with Stable-diffusion}
}

EG3D

 @inproceedings{Chan2022,
  author = {Eric R. Chan and Connor Z. Lin and Matthew A. Chan and Koki Nagano and Boxiao Pan and Shalini De Mello and Orazio         Gallo and Leonidas Guibas and Jonathan Tremblay and Sameh Khamis and Tero Karras and Gordon Wetzstein},
  title = {Efficient Geometry-aware {3D} Generative Adversarial Networks},
  booktitle = {CVPR},
  year = {2022}
}

컨트롤넷

 @misc{zhang2023adding,
  title={Adding Conditional Control to Text-to-Image Diffusion Models}, 
  author={Lvmin Zhang and Maneesh Agrawala},
  year={2023},
  eprint={2302.05543},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

소환

이 작업이 유용하다고 생각되면 다음을 통해 인용해 주시면 감사하겠습니다.

  @misc{chen2023it3d,
        title={IT3D: Improved Text-to-3D Generation with Explicit View Synthesis}, 
        author={Yiwen Chen and Chi Zhang and Xiaofeng Yang and Zhongang Cai and Gang Yu and Lei Yang and Guosheng Lin},
        year={2023},
        eprint={2308.11473},
        archivePrefix={arXiv},
        primaryClass={cs.CV}
  }

확장하다

추가 정보