epub2tts is a free and open source python app to easily create a full-featured audiobook from an epub or text file using realistic text-to-speech from Coqui AI TTS, OpenAI or MS Edge.
NOTE: NEW MULTIPROCESSING FEATURE ADDED! You can now use --threads N
to specify the number of threads to run where chapters will be processed in parallel! If you're using Edge or OpenAI you can set threads to as many chapters as you've got and they can all be processed at the same time. When using TTS/XTTS, you'll need to do some experimenting to see what your system can handle.
NOTE: Check out epub2tts-edge for a VERY fast lightweight alternative that only works with MS Edge. That version reads multiple sentences in parallel and goes much quicker!
epub2tts mybook.epub --export txt
# Part 1
etc with desired chapter names, and removing front matter like table of contents and anything else you do not want read. Note: First two lines can be Title: and Author: to use that in audiobook metadata. ALSO NOTE After Author/Title, the book copy MUST start with a chapter or section marked by a line with a hashmark at the beginning (like # Introduction
).% <speaker>
after the chapter name, for instance # Chapter One % en-US-AvaMultilingualNeural
. See the file multi-speaker-sample-edge.txt
for an example. Note: Only works with Coqui TTS multi-speaker engine (default) or --engine edge
.Using VITS model, all defaults, no GPU required:
epub2tts mybook.epub
(To change speaker (ex p307 for a good male voice w/Coqui TTS), add: --speaker p307
)Uses Microsoft Edge TTS in the cloud, FREE, only minimal CPU required, and it's pretty fast (100 minutes for 7hr book for instance). Many voices and languages to choose from, and the quality is really good (listen to sample-en-US-AvaNeural-edge.m4b
for an example).
edge-tts --list-voices
, default speaker is en-US-AndrewNeural
if --speaker
is not specified.epub2tts mybook.txt --engine edge --speaker en-US-AvaNeural --cover cover-image.jpg --sayparts
epub2tts mybook.txt --engine xtts --speaker "Damien Black" --cover cover-image.jpg --sayparts
epub2tts mybook.epub --scan
, determine which part to start and end on so you can skip TOC, etc.epub2tts my-book.epub --start 4 --end 20 --xtts voice-1.wav,voice-2.wav,voice-3.wav --cover cover-image.jpg
--export txt
, this option inserts %P%
at each paragraph break. Then when creating audio with --engine edge
, any time %P%
is found in the copy a 1.2 second pause in inserted.Thank you in advance for reporting any bugs/issues you encounter! If you are having issues, first please search existing issues to see if anyone else has run into something similar previously.
If you've found something new, please open an issue and be sure to include:
--debug --minratio 0
added on, to get more information--threads N
feature for multiprocessing, and support for NCX files that improves detection of how text is separated in an epub.--skip-cleanup
option to skip replacement of special characters with ","Typical inference times for xtts_v2 averaged over 4 processing chunks (about 4 sentences each) that can be expected:
| Hardware | Inference Time |
|-------------------------------------|----------------|
| 20x CPU Xeon E5-2630 (without AVX) | 3.7x realtime |
| 20x CPU Xeon Silver 4214 (with AVX) | 1.7x realtime |
| 8x CPU Xeon Silver 4214 (with AVX) | 2.0x realtime |
| 2x CPU Xeon Silver 4214 (with AVX) | 2.9x realtime |
| Intel N4100 Atom (NAS) | 4.7x realtime |
| GPU RTX A2000 4GB (w/o deepspeed) | 0.4x realtime |
| GPU RTX A2000 4GB (w deepspeed) | 0.15x realtime |
Required Python version is 3.11.
This installation requires Python < 3.12 and Homebrew (I use homebrew to install espeak, pyenv and ffmpeg). Per this bug, mecab should also be installed via homebrew.
Voice models will be saved locally in ~/.local/share/tts
#install dependencies
brew install espeak pyenv ffmpeg mecab
#install epub2tts
git clone https://github.com/aedocw/epub2tts
cd epub2tts
pyenv install 3.11
pyenv local 3.11
#OPTIONAL but recommended - install this in a virtual environment
pip install coqui-tts --only-binary spacy
python -m venv .venv && source .venv/bin/activate
pip install .
These instructions are for Ubuntu 22.04 (20.04 showed some dependency issues), but should work (with appropriate package installer mods) for just about any repo. Ensure you have ffmpeg
installed before use. If you have an NVIDIA GPU you should also install CUDA toolkit to make use of deepspeed.
Voice models will be saved locally in ~/.local/share/tts
#install dependencies
sudo apt install espeak-ng ffmpeg
#If you have a CUDA-compatible GPU, run:
sudo apt install nvidia-cuda-toolkit
#clone the repo
git clone https://github.com/aedocw/epub2tts
cd epub2tts
pip install coqui-tts --only-binary spacy
pip install .
NOTE: If you have deepspeed installed, it may be detected but not work properly, causing errors. Try installing CUDA toolkit to see if that resolves the issue. If that does not fix it, add --no-deepspeed
and it will not be used. Also in that case, open an issue with your details and we will look into it.
Running epub2tts in WSL2 with Ubuntu 22 is the easiest approach, but these steps should work for running directly in windows.
Install Microsoft C++ Build Tools. Download the installer from https://visualstudio.microsoft.com/visual-cpp-build-tools/ then run the downloaded file vs_BuildTools.exe
and select the "C++ Build tools" checkbox leaving all options at their default value. Note: This will require about 7 GB of space on C drive.
Install espeak-ng from https://github.com/espeak-ng/espeak-ng/releases/latest
Install chocolaty
Install ffmpeg with the command choco install ffmpeg
, make sure you are in an elevated powershell session.
Install python 3.11 with the command choco install python311
Install git with the command choco install git
.
Decide where you want your epub2tts project to live, documents is a common place. Once you've found a directory you're happy with, clone the project with git clone https://github.com/aedocw/epub2tts
and cd epub2tts so you're now in your working directory.
There are probably a few different ways you can go here, I personally opted for a venv to keep everything organized. Create a venv with the command python -m venv .venv
Activate the venv, on windows the command is slightly different as you issue .venvscriptsactivate
Install epub2tts along with the requirements with the commands pip install coqui-tts --only-binary spacy && pip install .
If all goes well, you should be able to call epub2tts from within your venv and update it from this directory going forward. To update, use git pull
and then pip install . --upgrade
Some errors you may encounter
pip install lxml
to install the latest version manually then re-run pip install .
python -c "import nltk"
then python -m nltk.downloader punkt
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
--no-deepspeed
and it will not be used.NOTE: Docker image has not been recently updated or tested, may be working but is out of date.
Voice models will be saved locally in ~/.local/share/tts
Docker usage does not reliably utilize GPU, if someone wants to work on improving this your PR will be very welcome!
For Linux and MacOS:
alias epub2tts='docker run -e COQUI_TOS_AGREED=1 -v "$PWD:$PWD" -v ~/.local/share/tts:/root/.local/share/tts -w "$PWD" ghcr.io/aedocw/epub2tts:release'
For Windows: Pre-requisites:
#Example for running scan of "mybook.epub"
docker run -e COQUI_TOS_AGREED=1 -v ${PWD}/.local/share/tts:/root/.local/share/tts -v ${PWD}:/root -w /root ghcr.io/aedocw/epub2tts:release mybook.epub --scan
#Example for reading parts 3 through 15 of "mybook.epub"
docker run -e COQUI_TOS_AGREED=1 -v ${PWD}/.local/share/tts:/root/.local/share/tts -v ${PWD}:/root -w /root ghcr.io/aedocw/epub2tts:release mybook.epub --start 3 --end 15
#clone the repo
git clone https://github.com/aedocw/epub2tts
cd epub2tts
#create a virtual environment
python -m venv .venv
#activate the virtual environment
source .venv/bin/activate
#install dependencies
sudo apt install espeak-ng ffmpeg
pip install coqui-tts --only-binary spacy
pip install -r requirements.txt
git pull
pip install . --upgrade
? Christopher Aedo
Contributors
Contributions, issues and feature requests are welcome!
Feel free to check the issues page or discussions page.
Give a ️ if this project helped you!