This tool allows you to test the accuracy of various AI detectors. It is a command line tool designed to make it easy to test a large number of detectors at the same time using the same data.
The tool takes a set of text files and runs them through a number of AI detectors. It then outputs the results to a CSV file. The tool also generates a confusion matrix to show the accuracy of the detectors. But what is a confusion matrix? A confusion matrix is a table that is used to describe the performance of a classification model. It shows the number of correct and incorrect predictions made by the classification model compared to the actual outcomes. This table is extremely useful for comparing the performance of different detectors as it will show the true positives, false positives, true negatives and false negatives for each detector. This allows you to see which detectors are the most accurate.
pip install -r requirements.txt
python main.py
Sample workflow:
python main.py
Type Y/N to select Originality.ai API: y
Enter your Originality.ai API key: YOUR_API_KEY
Enter the directory path for AI text files: data/ai/
Enter the directory path for human text files: data/human/
Enter the input CSV file path: data/input.csv
Enter the output CSV file name: output.csv
Tool will process the data. This may take a while.
Would you like to generate a confusion matrix? (y/n): y
Press enter to exit...
python matrix.py
to generate the confusion matrixThe tool expects data to be in .txt files in a folder which is passed to the tool when it is run. Or if you are trying to process a csv file it expects the columns to be in the following order:
text,dataset,label
sample text,gpt-3,ai
The dataset column can simply be 'ai' or 'human' this column is used to name the rows on the output
To add a detector you need to do the following:
detectors.py
file in the following format: "post_parameters": {
# The endpoint URL for the API.
"endpoint": "YOUR_API_ENDPOINT_URL",
# The body of the POST request. This usually contains the text to be analyzed.
# The actual contents will depend on what the API expects.
# Add or remove parameters as needed depending on the API requirements.
"body": {"PARAMETER_NAME": "PARAMETER_VALUE"},
# The headers for the POST request. This usually includes the API key and content type.
# Add or remove headers as needed depending on the API requirements.
"headers": {"HEADER_NAME": "HEADER_VALUE"},
# Information about where the API key is included in the request.
"API_KEY_POINTER": {
# The location that the API key will end up (usually 'headers' or 'body').
"location": "headers_or_body",
# The actual API key. This is usually read from an environment variable or input by the user.
"value": "YOUR_API_KEY",
# The name of the key or field where the API key is included. e.g 'x-api-key' or 'api_key'.
"key_name": "API_KEY_HEADER_OR_PARAMETER_NAME",
},
# The key in the body of the POST request where the text to be analyzed is included. e.g 'text' or 'content'.
"text_key": "KEY_NAME_FOR_TEXT",
},
"response": {
# The expected response from the API. The actual structure will depend on what the API returns.
# This should include mappings for how to interpret the API's response.
# Add or remove mappings as needed.
# e.g if the API returns a JSON object with a key called 'result' and the value of 'result' is a list of objects
# with a key called 'score' then the mapping would be:
# "result": {
# "score": "score"
# }
"200": {
"result": {
"MAPPING_FOR_DESIRED_OUTPUT": "RESPONSE_KEY_PATH",
}
}
},
}
api_endpoints.py
fileapi_endpoints.py
fileWe welcome contributions to this project! Here are some ways you can help:
If you find a bug, please report it by opening a GitHub issue. Be sure to include:
This information will help us diagnose and fix the bug faster.
We're always looking for ways to improve the tool! If you have an idea for an enhancement, open a GitHub issue and describe:
If you want to directly contribute code:
Ensure your PR adheres to the following:
We want to hear about your experience using the tool - good and bad. Let us know what worked and what didn't. Share stories of how this tool has helped your research. The more we hear from you, the better we can make the tool for everyone!
Thanks for contributing!
MIT License
Copyright (c) [2023] [Originality.AI]
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.