This is a FastAPI-based server that acts as a interface between your application and cloud-based AI services. It focuses on three main tasks:
Currently, it uses OpenAI's API for these services, but it's designed so we can add other providers in the future.
Transcription (Speech-to-Text)
Text-to-Speech
Speech-to-Speech
.
├── cloud_providers/
│ ├── base.py
│ └── openai_api_handler.py
├── server/
│ ├── main.py
│ ├── routers/
│ │ ├── transcribe.py
│ │ ├── tts.py
│ │ └── speech_to_speech.py
│ └── utils/
│ └── logger.py
|
└── requirements.txt
└── README.md
Clone the repository
Create a virtual environment:
python -m venv venv
source venv/bin/activate
Install dependencies:
pip install -r requirements
Set up environment variables:
export OPENAI_API_KEY=your_openai_api_key
To start the server, navigate to the project directory and run:
python server/main.py
This will start the FastAPI server, typically on http://localhost:8000
.
API docs
The application uses rotating file handlers for logging, with separate log files for different components:
logs/main.log
: Main application logslogs/transcription.log
: Transcription-specific logslogs/tts.log
: Text-to-speech logslogs/speech_to_speech.log
: Speech-to-speech logsThe application includes error handling for various scenarios, including API errors and WebSocket disconnections. Errors are logged and appropriate HTTP exceptions are raised.
The project is designed with extensibility in mind. The CloudProviderBase
abstract base class in base.py
allows for easy integration of additional cloud providers beyond OpenAI.
Contributions are welcome! Please feel free to submit a Pull Request.
[Specify your license here]