Transcribe, summarize, and create smart clips from video and audio content.
Transcription: Transcribe audio using WhisperX
Smart Summarization: Generate concise summaries of video content, tailored to different purposes:
Meeting Minutes
Podcast Summaries
Lecture Notes
Interview Highlights
General Content Summaries
Intelligent Clip Creation: Automatically create clips of key moments and topics discussed in the video.
Multi-format Support: Process various video and audio file formats.
Cloud Integration: Utilizes AWS S3 for efficient file handling and processing.
Python 3.8+
AWS CLI configured with appropriate permissions
FFmpeg installed on your system
Node.js and npm (for running the frontend GUI)
Clone the repository:
git clone https://github.com/sidedwards/ai-video-summarizer.git cd ai-video-summarizer
Set up the backend:
Copy config/config-example.yaml
to config/config.yaml
Edit config/config.yaml
with your API keys and preferences
Create and activate a virtual environment:
python -m venv .venv source .venv/bin/activate # On Windows, use `.venvScriptsactivate`
Install the required dependencies:
pip install -r requirements.txt
Set up your configuration:
Set up the frontend (optional, for GUI usage):
Navigate to the frontend directory:
cd frontend
Install the required dependencies:
npm install
Run the CLI script:
python backend/cli.py
Follow the prompts to select a video file and choose the type of summary you want to generate.
The generated summary files will be saved in a directory named after the input video file.
Start the backend server:
Run the backend server:
python backend/server.py
Start the frontend development server:
In a new terminal window, navigate to the frontend directory:
cd frontend
Run the frontend development server:
npm run dev
Open your web browser and navigate to http://localhost:5173
to access the AI Video Summarizer GUI.
Use the web interface to upload a video file, select the desired summary type, and start the processing.
Once the processing is complete, you can download the generated summary files as a zip archive.
Edit config/config.yaml
to set:
AWS CLI path and S3 bucket name
Replicate API key and model version
Anthropic API key and model choice
Other customizable parameters
Web-based GUI
Basic CLI
More LLM options
Export options for various document formats (PDF, DOCX, etc.)
Contributions are welcome! Please feel free to submit a Pull Request.
MIT License
This project uses WhisperX, an advanced version of OpenAI's Whisper model, for transcription. WhisperX offers:
Accelerated transcription
Advanced speaker diarization
Improved accuracy in speaker segmentation
The WhisperX model is run via the Replicate API, based on https://github.com/sidedwards/whisperx.