Note I do not plan on actively working on improvements/enhancements for this project, this is mainly meant to keep the repo in a working state in the case the original git.ecker goes down or necessary package changes need to be made.
That being said, some enhancements added compared to the original repo:
✔️ Possible to train in other languages
✔️ Hifigan added, allowing for faster inference at the cost of quality.
✔️ whisper-v3 added as a chooseable option for whisperx
✔️ Output conversion using RVC
This is a fork of the repo originally located here: https://git.ecker.tech/mrq/ai-voice-cloning. All of the work that was put into it to incoporate training with DLAS and inference with Tortoise belong to mrq, the author of the original ai-voice-cloning repo.
This repo works on Windows with NVIDIA GPUs and Linux running Docker with NVIDIA GPUs.
start.bat
If you are installing this manually, you will need:
git clone https://github.com/JarodMica/ai-voice-cloning.git
setup-cuda.bat
file and it will start running through all of the python packages needed
start.bat
and this will start downloading most of the models you'll need.
models
folder of the root.setup-whipserx.bat
Make sure the latest nvidia drivers are installed: sudo ubuntu-drivers install
Install Docker your preferred way. One way to do it is to follow the official documentation here.
If, when launching the voice cloning docker, you have an error message saying that the GPU cannot be used, you might have to install Nvidia Docker Container Toolkit.
Install with the "apt" method
Run the docker configuration command
sudo nvidia-ctk runtime configure --runtime=docker
Restart docker
Make sure your Nvidia drivers are up to date: https://www.nvidia.com/download/index.aspx
wsl --install
and restartubuntu
. It should now load you into wsl2sudo apt-key del 7fa2af80
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-4
ubuntu
, then follow belowgit clone https://github.com/JarodMica/ai-voice-cloning.git && cd ai-voice-cloning
./setup-docker.sh
./start-docker.sh
http://localhost:7860
or remotely with http://<ip>:7860
If remote server cannot be reached, checkout this thread
You might also need to remap your local folders to the Docker folders. To do this, you must open the "start-docker.sh" script, and update some lines. For instance, if you want to find your generated audios easily, create a "results" folder in the root directory, and then in "start-docker.sh" add the line:
-v "your/custom/path:/home/user/ai-voice-cloning/results"
Checkout the YouTube video:
Watch First: https://youtu.be/WWhNqJEmF9M?si=RhUZhYersAvSZ4wf
Watch Second (RVC update): https://www.youtube.com/watch?v=7tpWH8_S8es&t=504s
Everything is pretty much the same as before if you've used this repository in the past, however, there is a new option to convert text output using rvc
. Before you can use it, you will need a trained RVC .pth file that you get from RVC or online, and then you will need to place it in models/rvc_models/
. Both .index and .pth files can be placed in here and they'll show up correctly in their respective dropdown menus.
To enable rvc:
Show Experimental Settings
to reveal more optionsRun the outputter audio through RVC
.
You will now have access to parameters you could adjust in RVC for the RVC voice model you're using.Below are how you can update the package for the latest updates
NOTE: If there are major feature change, check the latest release to see if
update_package.bat
will work. If NOT, you will need to re-download and re-extract the package from Hugging Face.
update_package.bat
file
You should be able to navigate into the folder and then pull the repo to update it.
cd ai-voice-cloning
git pull
If there are large features added, you may need to delete the venv and the re-run the setup-cuda script to make sure there are no package issues
You should be able to navigate into the folder and then pull the repo to update it, then rebuild your Docker image.
cd ai-voice-cloning
git pull
./setup-docker.sh
The terminal is your friend. Any errors or issues will pop-up in the terminal when you go to try and run, and then you can start debugging from there.
.venvScriptsactivate.bat
pip uninstall torch
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
If you run into any problems, please open up a new issue on the issues tab.
setup-cuda.bat
should have everything that you need for the packages to be installed. All of the different requirements files make it quite a mess in the script, but each repo has their requirements installed, and then at the end, the requirements.txt
in the root is needed to change the version back to compatible versions for this repo.