genai system evaluation
1.0.0
This repository contains sample notebooks to demonstrate how to evaluate an LLM-augmented system. It provides tools and methods for local evaluation.
These notebooks were tested with Python 3.12. If you're running locally, ensure you're using 3.12. Also ensure that you have the AWS CLI setup with the credentials you want set to the default profile. These credentials need access to Amazon Bedrock Models
LLM-System-Validation/
├── data/ # RAG context and validation datasets
├── example-notebooks/ # Notebooks for evaluating various components
|__ script/ # Various scripts for setting up environment.
|__ .github/ # Example github actions
data/
: Contains the datasets used for Retrieval-Augmented Generation (RAG) context and validation.example-notebooks/
: Jupyter notebooks demonstrating the evaluation of:
Clone the repository:
git clone [email protected]:aws-samples/genai-system-evaluation.git
cd genai-system-evaluation
Set up a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows, use `venvScriptsactivate`
Install the required dependencies:
pip install -r requirements.txt
Download opensearch docs for RAG context.
$ cd data && mkdir opensearch-docs && cd opensearch-docs
$ git clone https://github.com/opensearch-project/documentation-website.git
Go to notebook examples & start jupyter notebooks!
$ cd ../../example-notebooks
$ jupyter notebook
Start at notebook 1 and work your way through them!
example-notebooks/
directory to understand different evaluation techniques.See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.