A Python project to create VR environments using Generative AI. You can run it as a TCP server to interface it with a Unity client, to get the fully-fledged AI/VR application.
This is a public archive, development continues at HugoFara/speech-to-world-server!
This is a use case of generative AI to build a complete VR scenery. It was developed at the Fondation Campus Biotech Geneva, in collaboration with the Laboratory of Cognitive Science, by Hugo FARAJALLAH.
You need to get Python 3.10 and CUDA 12.1 (other versions are untested). Once the requirements are installed, the project should work.
Here is a detailed installation procedure:
cd VR-Environment-GenAI-Server
# From https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/#create-and-use-virtual-environments
python -m venv .venv # Creates the virtual environment under .venv
source .venv/bin/activate # Activates it
cd VR-Environment-GenAI-Server
# From https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/#create-and-use-virtual-environments
py -m venv .venv # Creates the virtual environment under .venv
.venvScriptsactivate # Activates it
pip install -r requirements.txt
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
.
Please have a look at https://pytorch.org/get-started/locally/ for details.From here on, the project should be functional. The next section is optional, but it can save you a lot of time.
(optional) You can speed up image generation using accelerate. Download it with
pip install accelerate
.
.idea
folder is included to add the folder as a project.sudo apt install ffmpeg portaudio19-dev python3-pyaudio
pip install -r requirements-optional.txt # Installs PyAudio
Each file can be executed independently, so they are as many entry points as files.
The most common use cases are the following:
python -m skybox.diffusion
.python -m utils.download_models
.
If you don't do it the models will be downloaded at run time which may be very slow.python -m server.run
.Next is the detail for special files.
Go to the skybox
folder.
skybox/legacy
may not be useful. I keep it there for personal intents.3D features are in the environment
folder. It is still in active development at the time of writing (June 2024),
hence the following is subject to change.
For speech to text features, go to asr
(automatic speech recognition)
If you want to use a graphical interface instead of Python code,
you can use the provided ComfyUI workflows
in the ComfyUI
folder.
The explanation for each workflow is detailed in ComfyUI/README.md.
The server features are in server
. See Start as a TCP server for the details on usage.
sound
folder has some experiments with sound generation.utils
folder contains useful functions for the user:
The main server configuration is in api.json
.
The most significant configuration data are "serverIp" and "serverPort" as they set the address of the server.
A TCP server can be started in order to offload the AI part from the application thread.
Just launch python -m server.run
. The server configuration is defined in api.json
.
The communication is handled in JSON format, with a strong HTTP style.
To connect to the server from another computer on the same network, you need to open a port.
On Windows, you simply need to go to the control panel add a new rule for the port 9000
(with the default configuration).
This How-To Geek tutorial seems guiding enough.
On Linux, opening ports is a bit more fun, I personally recommend using nginx with a port redirection.
Current status of the project, from a very far perspective.
skybox/panorama_creator.py
environment/renderer.py
not suitable for production now.This project includes several artificial neural network models. If you want to substitute a model by another one, you should have a good knowledge of what you are doing, otherwise the quality of the end product may be decreased.
Please have a look at utils/download_models.py
to see where those models are loaded from.
You can download the official Unity client from VR-Environment-GenAI-Unity (GitHub). If you are looking for the active public repository of this project, go to HugoFara/speech-to-world-server.