The Scene Language: Representing Scenes with Programs, Words, and Embeddings
arXiv | Project Page
Downcodes 小编
This repository implements the Scene Language presented in the paper "The Scene Language: Representing Scenes with Programs, Words, and Embeddings". It allows for text-conditioned and image-conditioned 3D scene generation.
Installation
Environment
`bash
conda create --name sclg python=3.11
conda activate sclg
pip install mitsuba
if you run into segmentation fault, you might need specific mitsuba versions
e.g., pip install --force-reinstall mitsuba==3.5.1 on MacOS
pip install unidecode Pillow anthropic transforms3d astor ipdb scipy jaxtyping imageio
required for minecraft renderer
pip install spacy
python -m spacy download encoreweb_md
pip install --force-reinstall numpy==1.26.4 # to be compatible with transforms3d
git clone https://github.com/zzyunzhi/scene-language.git
cd scene-language
pip install -e .
`
Language Model API
1. Get Your API Key: Obtain your Anthropic API key following the official documentation.
2. Add Key to engine/key.py:
`python
ANTHROPICAPIKEY = 'YOURANTHROPICAPI_KEY'
OPENAIAPIKEY = 'YOUROPENAIAPIKEY' # optional, required for LLMPROVIDER='gpt'
`
3. Switch Language Models (Optional): You can switch to different language models by modifying the LLM_PROVIDER setting in engine/constants.py. The default is Claude 3.5 Sonnet.
Text-Conditioned 3D Generation
Renderer: Mitsuba
`bash
python scripts/run.py --tasks "a chessboard with a full set of chess pieces"
`
Renderings will be saved to ${PROJROOT}/scripts/outputs/run${timestep}${uuid}/${scenename}${uuid}/${sampleindex}/renderings/*.gif.
Example Results: Raw Outputs
Renderer: Minecraft
`bash
ENGINE_MODE=minecraft python scripts/run.py --tasks "a detailed cylindrical medieval tower"
`
Generated scenes are saved as JSON files in ${PROJROOT}/scripts/outputs/run${timestep}${uuid}/${scenename}${uuid}/${sampleindex}/renderings/*.json.
Visualization:
1. Run the following command:
`bash
python viewers/minecraft/run.py
`
2. Open http://127.0.0.1:5001 in your browser.
3. Drag generated JSON files to the web page.
Example Results: Raw Outputs
Image-Conditioned 3D Generation
`bash
python scripts/run.py --tasks ./resources/examples/* --cond image --temperature 0.8
`
Codebase Details
The following table lists helper functions defined in this repository, aligned with expressions defined in the domain-specific language (DSL) (Tables 2 and 5 of the paper):
| Function | DSL Expression |
|-----------------|----------------|
| ... | ... |
| ... | ... |
Codebase Improvements
The current codebase offers the following features:
1. Text-Conditioned Generation: Generate 3D scenes based on textual descriptions.
2. Image-Conditioned Generation: Generate 3D scenes based on input images.
Future Updates:
Support for additional tasks and renderers will be added in future updates.
Contributions
Feel free to submit a Pull Request or contact us via email if you have any feature requests, suggestions, or would like to share your results.
Citation
`
@article{zhang2024scenelanguage,
title={The Scene Language: Representing Scenes with Programs, Words, and Embeddings},
author={Yunzhi Zhang and Zizhang Li and Matt Zhou and Shangzhe Wu and Jiajun Wu},
year={2024},
journal={arXiv preprint arXiv:2410.16770},
}
`
License: Apache-2.0