ChatGPT Voice Assistant
- The ChatGPT Voice Assistant uses a Raspberry Pi (or desktop) to enable spoken conversation with OpenAI large language models. This implementation listens to speech, processes the conversation through the OpenAI service, and responds back. Like Apple Siri, Amazon Alex, Google Nest Home, Mi XiaoAi etc.
- This project is written in python which supports Linux/Raspbian, macOS, and Windows.
Features
- Supports real-time voice dialogue. After ChatGPT returns a sentence, you can hear the voice instead of waiting for all ChatGPT replies before starting the voice synthesis.
- Support continuous dialogue, save the history of all ChatGPT current conversations. When the ChatGPT conversation is larger than 4096 tokens (gpt-3.5-turbo), the early conversation history will be discarded.
- Support local wake word, use it just like Siri.
Voice Assistant Speaker
- Hardware
- $ for Raspberry PI 3/3B/4/4B
- $ for USB Micro Phone
- $ for Aux Speaker
- $ for an SD card (>= 8GB ) (to setup the Raspberry Pi OS)
- Software
- Azure Cognitive Speech Services
-
Free tier: 5 audio hours per month and 1 concurrent request.
-
Free $200 credit: With a new Azure account that can be used during the first 30 days.
- OpenAI
-
$0.002 / 1K tokens / ~750 words: ChatGPT (gpt-3.5-turbo)
-
Free $18 credit: With a new OpenAI account that can be used during your first 90 days.
Setup
- You will need an instance of Azure Cognitive Services and an OpenAI account. You can run the software on nearly any platform, but let's start with a Raspberry Pi.
Raspberry Pi
- If you are new to Raspberry Pis, check out this getting started guide.
1. OS
- Insert an SD card into your PC.
- Go to https://www.raspberrypi.com/software/ then download and run the Raspberry Pi Imager.
- Click
Choose OS
and select the Raspberry Pi OS (64-bit) or Ubuntu 22.04.2 LTS (64-bit) .
- Click
Choose Storage
, select the SD card.
- Click
Write
and wait for the imaging to complete.
- Put the SD card into your Raspberry Pi and connect a keyboard, mouse, and monitor.
- Complete the initial setup, making sure to configure Wi-Fi.
2. USB Speaker/Microphone
- Plug in the USB speaker/microphone if you have not already.
- On the Raspberry PI OS desktop, right-click on the volume icon in the top-right of the screen and make sure the USB device is selected.
- Right-click on the microphone icon in the top-right of the screen and make sure the USB device is selected.
Azure
The conversational speaker uses Azure Cognitive Service for speech-to-text and text-to-speech. Below are the steps to create an Azure account and an instance of Azure Cognitive Services.
1. Azure Account
- In a web browser, navigate to https://aka.ms/friendbot/azure and click on
Try Azure for Free
.
- Click on
Start Free
to start creating a free Azure account.
- Sign in with your Microsoft or GitHub account.
- After signing in, you will be prompted to enter some information.
NOTE: Even though this is a free account, Azure still requires credit card information. You will not be charged unless you change settings later.
- After your account setup is complete, navigate to https://aka.ms/friendbot/azureportal.
2. Azure Cognitive Services
- Sign into your account at https://aka.ms/friendbot/azureportal.
- In the search bar at the top, enter
Cognitive Services
. Under Marketplace
select Cognitive Services
. (It may take a few seconds to populate.)
- Verify the correct subscription is selected. Under
Resource Group
select Create New
. Enter a resource group name (e.g. conv-speak-rg
).
- Select a region and a name for your instance of Azure Cognitive Services (e.g.
my-conv-speak-cog-001
).
NOTE: EastUS, WestEurope, or SoutheastAsia are recommended, as those regions tend to support the greatest number of features.
- Click on
Review + Create
. After validation passes, click Create
.
- When deployment has completed you can click
Go to resource
to view your Azure Cognitive Services resource.
- On the left side navigation bar, under
Resourse Management
, select Keys and Endpoint
.
- Copy either of the two Cognitive Services keys. Save this key in a secure location for later.
Windows 11 users: If the application is stalling when calling the text-to-speech API, make sure you have applied all current security updates (link).
OpenAI
The conversational speaker uses OpenAI's models to hold a friendly conversation. Below are the steps to create a new account and access the AI models. Supports OpenAI official API or Azure OpenAI API, just choose one.
1. OpenAI Account
- In a web browser, navigate to https://aka.ms/maker/openai. Click
Sign up
.
NOTE: can use a Google account, Microsoft account, or email to create a new account.
- Complete the sign-up process (e.g., create a password, verify your email, etc.).
NOTE: If you are new to OpenAI, please review the usage guidelines (https://beta.openai.com/docs/usage-guidelines).
- In the top-right corner click on your account. Click on
View API keys
.
- Click
+ Create new secret key
. Copy the generated key and save it in a secure location for later.
If you are curious to play with the large language models directly, check out the https://platform.openai.com/playground?mode=chat at the top of the page after logging in to https://aka.ms/maker/openai.
2. Azure OpenAI Account
Choose between OpenAI official account or Azure OpenAI account
- Create an Azure Account
- If you don't have an Azure account, go to the Azure official website to sign up for an account. Azure offers a free account option, and new users can get a certain amount of free credits for testing and learning.
- Apply for Access
- On the Azure OpenAI service page, click the "Apply for Access" button. This will take you to the application page where you need to fill in some necessary information, including your company name, use case, etc.
- Configure and Use
- Once you have access, you can create a new OpenAI service resource in the Azure portal. After creation, you can get the API key and start using the Azure OpenAI service following the official documentation.
The Code
1. Code Configuration
- The Python Speech SDK package is available for Windows (x64 and x86), Mac x64 (macOS X version 10.14 or later), Mac arm64 (macOS version 11.0 or later), and Linux
- On the Raspberry Pi or your PC, open a command-line terminal.
- On Ubuntu or Debian, run the following commands for the installation of required packages:
sudo apt-get update
sudo apt-get install libssl-dev libasound2
- On Ubuntu 22.04 LTS it is also required to download and install the latest libssl1.1 package e.g. from http://security.ubuntu.com/ubuntu/pool/main/o/openssl/.
- Clone the repo.
git clone https://github.com/jackwuwei/gptspeaker.git
- Set your API keys: Replace config.json
{AzureCognitiveServices.Key}
and {AzureCognitiveServices.Region}
with your OpenAI API key and {OpenAI.Key}
with your OpenAI API key.
{
"AzureCognitiveServices": {
"Key": "AzureCognitiveServicesKey",
"Region": "AzureCognitiveServicesRegion",
},
"OpenAI": {
"Key": "OpenAIKey",
},
// Just choose one of the two OpenAI above
"AzureOpenAI":
{
"Key": "", // Key 1 or Key 2
"api_version": "2024-02-01",
"Endpoint": "", // Endpoint
"Model": "" // Azure AI Studio deployment name
}
}
- Install requirements
pip3 -r install requirements.txt
- Run the code
2. (Optional) Create a custom wake phrase
The code base has a default wake phrase ("Hey GPT"
) already, which I suggest you use first. If you want to create your own (free!) custom wake word, then follow the steps below.
- Create a custom keyword model using the directions here: https://aka.ms/hackster/microsoft/wakeword.
- Download the model, extract the
.table
file and copy it to source root directory.
- Update
config.json
file to include your wake phrase file in the build.
"AzureCognitiveServices": {
"WakePhraseModel": "xxx.table",
"WakeWord": "xxx",
}
- Rebuild and run the project to use your custom wake word.