Provides talk in realtime with AI, completely local on your PC, with customizable AI personality and voice.
Hint: Anybody interested in state-of-the-art voice solutions please also have a look at Linguflex. It lets you control your environment by speaking and is one of the most capable and sophisticated open-source assistants currently available.
Note: If you run into 'General synthesis error: isin() received an invalid combination of arguments' error, this is due to new transformers library introducing an incompatibility to Coqui TTS (see here). Please downgrade to an older transformers version:
pip install transformers==4.38.2
or upgrade RealtimeTTS to latest versionpip install realtimetts==0.4.1
.
Integrates the powerful Zephyr 7B language model with real-time speech-to-text and text-to-speech libraries to create a fast and engaging voicebased local chatbot.
Hint: If you run into problems installing llama.cpp please also have a look into my LocalEmotionalAIVoiceChat project. It includes emotion-aware realtime text-to-speech output and has multiple LLM provider options. You can also use it with different AI models.
This software is in an experimental alpha state and does not provide production ready stability. The current XTTS model used for synthesis still has glitches and also Zephyr - while really good for a 7B model - of course can not compete with the answer quality of GPT 4, Claude or Perplexity.
Please take this as a first attempt to provide an early version of a local realtime chatbot.
You will need a GPU with around 8 GB VRAM to run this in real-time.
NVIDIA CUDA Toolkit 11.8:
NVIDIA cuDNN 8.7.0 for CUDA 11.x:
Install ROCm v.5.7.1
FFmpeg:
Install FFmpeg according to your operating system:
Ubuntu/Debian:
sudo apt update && sudo apt install ffmpeg
Arch Linux:
sudo pacman -S ffmpeg
macOS (Homebrew):
brew install ffmpeg
Windows (Chocolatey):
choco install ffmpeg
Windows (Scoop):
scoop install ffmpeg
Clone the repository or download the source code package.
Install llama.cpp
(for AMD users) Before the next step set env variable LLAMA_HIPBLAS
value to on
Official way:
pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose
Install realtime libraries
pip install RealtimeSTT==0.1.7
pip install RealtimeTTS==0.2.7
Download zephyr-7b-beta.Q5_K_M.gguf from here.
model_path
.If dependency conflicts occur, install specific versions of conflicting libraries:
pip install networkx==2.8.8
pip install typing_extensions==4.8.0
pip install fsspec==2023.6.0
pip install imageio==2.31.6
pip install numpy==1.24.3
pip install requests==2.31.0
python ai_voicetalk_local.py
Open chat_params.json to change the talk scenario.
If the first sentence is transcribed before you get to the second one, raise post_speech_silence_duration on AudioToTextRecorder:
AudioToTextRecorder(model="tiny.en", language="en", spinner=False, post_speech_silence_duration = 1.5)
Contributions to enhance or improve the project are warmly welcomed. Feel free to open a pull request with your proposed changes or fixes.
The project is under Coqui Public Model License 1.0.0.
This license allows only non-commercial use of a machine learning model and its outputs.
Kolja Beigel
Feel free to reach out for any queries or support related to this project.