arXivRAG is a comprehensive tool designed to enhance the retrieval and generation of academic content from the arXiv database. Leveraging advanced Retrieval-Augmented Generation (RAG) techniques, arXivRAG provides researchers, students, and enthusiasts with the ability to discover and generate summaries, insights, and analyses of arXiv papers efficiently.
Retrieval-Augmented Generation: Combines the power of retrieval systems with generative models to enhance the accuracy and relevance of responses.
arXiv Integration: Directly queries the arXiv repository to fetch and summarize academic papers.
User-Friendly Interface: Provides an easy-to-use interface for querying and obtaining summaries of scientific papers.
Customizable: Allows users to customize the retrieval and generation parameters to suit their specific needs.
Enhanced Search: Advanced search capabilities to quickly find relevant papers.
Summarization: Automatic generation of concise summaries for arXiv papers.
Custom Queries: Tailored query support to retrieve specific information from academic papers.
Real-Time Access: Seamless integration with the arXiv API for real-time data access.
Citation and Trend Analysis: Analyze citation networks, visualize the impact of papers, and identify emerging research trends based on recent publications and citation patterns.
To get started with arXivRAG, follow these steps:
Clone the repository:
git clone https://github.com/phitrann/arXivRAG.git cd arXivRAG
Create a virtual environment (we recommend using conda):
conda create -n arxiv-rag python=3.10 conda activate arxiv-rag
Install the required dependencies:
pip install -r requirements.txt
To use arXivRAG, follow these steps:
Run the main script:
python main.py
Query the system:
Enter your query related to a scientific paper.
The system will retrieve relevant papers from arXiv and generate a summary.
You can customize the behavior of arXivRAG by modifying the configuration file config.yaml
. Key parameters include:
retrieval_model: The model used for retrieving relevant papers.
generation_model: The model used for generating summaries.
num_retrievals: The number of papers to retrieve for each query.
max_summary_length: The maximum length of the generated summary.
We welcome contributions from the community! If you have ideas for new features or improvements, feel free to open an issue or submit a pull request.
In case you want to submit a pull request, please follow these steps:
Fork the repository.
Create a new branch:
git checkout -b feature/your-feature-name
Make your changes and commit them:
git commit -m "Add your commit message"
Push to the branch:
git push origin feature/your-feature-name
Create a pull request.
This project is released under the Apache 2.0 license. See the LICENSE file for details.
Thanks to the contributors of the arXivRAG project.
Special thanks to the developers of the retrieval and generation models used in this project.