GenAI LaTeX Proofreader is an automated tool that uses generative AI to proofread and suggest improvements to scientific papers written in LaTeX. The suggestions are appended into the original LaTeX source file creating a proofreading report. This tool is primarily intended for authors working on a scientific paper.
In more detail, the generated proofreading report contains the original paper under review, with a list of suggestions attached to the beginning of each section. For each section, feedback is created from the perspective of different proofreading personas. For example, when writing a paper, these could be "Domain expert", "English language expert" and "Book editor". However, the personas could also include additional personas such as "Statistical reviewer", "LaTeX specialist" or "Inclusive language expert" etc depending on the topic of the paper.
Here "proofreading" should be interpreted broadly. While current large language models (LLMs) have an understanding of logic, mathematics and physics, LLMs should not be trusted for serious proofreading of scientific results. Thus, any suggestions should be evaluated critically. On the other hand, for authors familiar with a topic, the generated report can be used to gauge how deeply a LLM can reason about the paper under review.
GenAI LaTeX Proofreader requires a subscription to the Anthropic API.
For development and testing, GenAI LaTeX Proofreader is regularly evaluated by proofreading two test papers:
In more detail, these papers are proofread for all manually triggered CI runs in this repo.
Thus you can inspect the generated proofreading reports (report.pdf
) from recent CI pipeline runs on Github:
Completely automated proofreading of LaTeX documents.
In addition to the above, one can add other proofreading personas. However, this currently requires that one edit the Python source code.
The idea of using different AI personas for proofreading is inspired by Ethan Mollick's book Co-Intelligence: Living and working with AI published 4/2024.
section{..}
will not be proofread.section*{..}
.Note that this work is an early proof of concept, so some familiarity with the development tools (git, Python, Docker, Anthropic API access) may be needed to get this working.
The below steps (for Mac/Linux-based systems) describe how to proofread a paper:
Step 1: Clone the repo
git clone [email protected]:genai-latex-proofreader/genai-latex-proofreader.git
cd genai-latex-proofreader
Step 2: Build the Docker container (with Python and Latex)
(cd .devcontainer/latex; make build)
Step 3: Set up secret token to the Anthropic API, see https://docs.anthropic.com/en/docs/quickstart
export ANTHROPIC_API_KEY='your-secret-api-key-here'
(Note: do not share your ANTHROPIC_API_KEY
)
Step 4: Copy the files required to build your paper into the 'paper-to-proofread' subdirectory in the repo.
mkdir paper-to-proofread
cp -R /path/to/your/paper/. paper-to-proofread
For testing you can use a dummy paper tests/integration/assets/empty_paper.tex
provided in the repo.
mkdir paper-to-proofread
cp -R tests/integration/assets/. paper-to-proofread/
(Note: Please always have a backup of your paper.)
Step 5: Run genai-latex-proofreader
(cd .devcontainer/latex; docker compose run --rm --entrypoint "python3" genai-latex-proofreader-service -m genai_latex_proofreader.cli --input_latex_path paper-to-proofread/empty_paper.tex --output_report_filepath output/report.tex)
For a medium size paper, this will take a few minutes.
If everything worked, the proofreading report can be found in output/report.pdf
.
Depending on the topic of your paper, you may want to adjust the prompts that define the proofreading personas. Currently the prompts need to be edited directly in the Python source code.
GenAI LaTeX Proofreader uses GenAI (Generative AI) and large language models (LLM) to automate proofreading of scientific papers. As of 2024, GenAI is a quickly evolving technology with rapid developments.
The below list contains some references and related works about this topic, and more broadly about using AI to make scientific discoveries:
12/2023, Microsoft Research, The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4
10/2023, W. Liang et al., Can large language models provide useful feedback on research papers? A large-scale empirical analysis
6/2023, AI to Assist Mathematical Reasoning: A Workshop organized by National Academies of Sciences.
Contributions, feedback or ideas are welcome!
Feel free to contact me or raise an issue in this repo.
(This question is outside my area of expertice.)
The guidelines and practices around using AI-content are still evolving. However, for publishing work in an academic setting, please first refer to your advisor, department, journal and/or university.
Please also note that:
"GenAI LaTeX Proofreader" is copyright 2024 Matias Dahl (and contributors), and distributed under the terms of the MIT open source license.
Portions of this work has been developed using AI-powered tools.
For details, please see the LICENSE file.