GPT-4V Screenshot Analyzer
Description
The GPT-4V Screenshot Analyzer is a tool that integrates the capabilities of OpenAI's GPT-4 Vision API into an interactive way to analyze and understand your screenshots. Screenshots are analyzed by GPT-4V to provide detailed descriptions. Additionally, this tool supports interactive dialogue, enabling users to ask follow-up questions about the screenshots for more in-depth information.
Features
-
Image Analysis: Utilize GPT-4 Vision API to analyze and describe screenshots.
-
Interactive Dialogue: Engage in a chat with the AI about the screenshot for detailed insights and follow-up questions.
-
User-Friendly Interface: Simple GUI for viewing screenshots and interacting with the AI.
Installation (Tested on Ubuntu 20.04)
-
Clone the Repository
git clone https://github.com/jeremy-collins/gpt4v-screenshot-analyzer.git
-
Install Dependencies
- Ensure Python 3 is installed.
- Install required Python libraries:
pip install -r requirements.txt
-
Set Up OpenAI API Key
- Obtain an API key from OpenAI.
- Set your OpenAI API key as an environment variable:
echo 'export OPENAI_API_KEY=<put your key here>' >> ~/.bashrc
- Alternatively, you can set the api_key variable inside gpt4v_screenshot_analyzer.py to your OpenAI key, but this is a security risk.
-
Systemd Service Setup (Optional)
- First, make the gpt4_screenshot_analyzer.py file executable:
sudo chmod +x gpt4_screenshot_analyzer.py
- Then, customize the gpt4-screenshot.service file to your needs.
- You will need to change the path to the gpt4_screenshot_analyzer.py file inside the ExecStart line.
- You may also need to change the display number in the Environment line.
- Lastly, you may want to change the User line.
- To run the application as a service to be started on boot, follow these steps:
sudo cp gpt4-screenshot.service /etc/systemd/system/
sudo systemctl enable gpt4-screenshot
sudo systemctl start gpt4-screenshot
- If this doesn't work, you can debug the service by running:
sudo systemctl status gpt4-screenshot
- These commands may also be useful:
sudo systemctl daemon-reload
sudo systemctl stop gpt4-screenshot
sudo systemctl restart gpt4-screenshot
sudo systemctl disable gpt4-screenshot
-
Enabling Display Access on Startup (optional)
- To enable display access on startup, open Startup Applications (Ubuntu) and add a Startup Program with the following command:
path/to/repo/gpt4v-screenshot-analyzer/enable_xhost.sh
Usage
- Start the application (you can skip this step if you followed steps 4 and 5):
python3 gpt4_screenshot_analyzer.py
- Use the
Ctrl+Alt+S
hotkey to start a screenshot capture.
- Drag to select the area you want to capture.
- GPT-4V will analyze the screenshot and display the results in a GUI window.
- Use the text box in the GUI to ask follow-up questions.
Contributing
Contributions are welcome! If you'd like to contribute, please fork the repository and use a feature branch. Pull requests are welcome.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Credits
Developed by Jeremy A. Collins. Special thanks to OpenAI for providing the GPT-4 Vision API.