Reasoning about different RAG Reranker models in a unified way | Fine-tuning the full-link RAG reranker model | Experimental results | License
RAG-Retrieval provides full-link RAG retrieval fine-tuning (train), inference (infer) and distillation (distill) code.
Join our WeChat group chat
10/21/2024: RAG-Retrieval released two different methods for doing Reranker tasks based on LLM, as well as a method of distilling them into bert. What are the best practices for LLM on Reranker tasks? A simple experiment report (with code)
6/5/2024: MRL loss implementation of RAG-Retrieval’s Embedding model. RAG-Retrieval: Make MRL loss the standard configuration of training vector (embedding) models
6/2/2024: RAG-Retrieval implements supervised RAG retriever fine-tuning based on LLM preferences. RAG-Retrieval implements fine-tuning of RAG retriever based on LLM preference supervision
5/5/2024: Release of the lightweight python library RAG-Retrieval: Your RAG application deserves a better sorting inference framework
3/18/2024: Release RAG-Retrieval RAG-Retrieval Zhihu introduction
The ranking model is an important part of any retrieval architecture and an important part of RAG, but the current status quo is:
Therefore, RAG-Retrieval has developed a lightweight python library rag-retrieval, which provides a unified way to call any different RAG sorting model. It has the following characteristics.
Supports multiple sorting models: supports common open source sorting models (Cross Encoder Reranker, Decoder-Only LLM Reranker)
Long doc friendly: supports two different processing logics for long docs (maximum length truncation, segmentation and maximum score).
Easy to expand: If there is a new ranking model, the user only needs to inherit basereranker and implement the rank and compute_score functions.
#为了避免自动安装的torch与本地的cuda不兼容,建议进行下一步之前先手动安装本地cuda版本兼容的torch。
pip install rag-retrieval
For Cross Encoder Reranker, as long as it uses AutoModelForSequenceClassification of transformers, you can use rag_retrieval's Reranker for reasoning. Examples are as follows.
Cross Encoder models of the bge series, such as (BAAI/bge-reranker-base, BAAI/bge-reranker-large, BAAI/bge-reranker-v2-m3)
Cross Encoder model of bce, for example (maidalun1020/bce-reranker-base_v1)
For LLM Reranker, rag_retrieval's Reranker supports a variety of powerful LLM ranking models. It also supports using any LLM chat model for zero shot sorting. Examples are as follows.
LLM Reranker models of the bge series, such as (BAAI/bge-reranker-v2-gemma, BAAI/bge-reranker-v2-minicpm-layerwise, BAAI/bge-reranker-v2-m3)
It also supports using any LLM chat model for zero shot sorting.
For detailed usage and precautions of the rag-retrieval package, please refer to Tutorial
We have done a lot of tests to align with the original reasoning framework below. See Tests for details. They require different modules to execute, and rag_retrieval uses a unified interface.
Such as FlagEmbedding's FlagReranker, FlagLLMReranker, LayerWiseFlagLLMReranker.
Such as BCEmbedding's RerankerModel
conda create -n rag-retrieval python=3.8 && conda activate rag-retrieval
#为了避免自动安装的torch与本地的cuda不兼容,建议进行下一步之前先手动安装本地cuda版本兼容的torch。
pip install -r requirements.txt
Support fine-tuning any open source embedding model (bge, m3e, etc.)
Supports fine-tuning of two types of data:
Fine-tune the embedding model process. For detailed processes, please refer to the Tutorial in the model directory.
cd ./rag_retrieval/train/embedding
bash train_embedding.sh
Fine-tune the colbert model process. For detailed processes, please refer to the Tutorial in the model directory.
cd ./rag_retrieval/train/colbert
bash train_colbert.sh
Fine-tune the reranker model process. For detailed processes, please refer to the Tutorial in the model directory.
cd ./rag_retrieval/train/reranker
bash train_reranker.sh
Model | Model Size(GB) | T2Reranking | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg |
---|---|---|---|---|---|---|
bge-reranker-base | 1.11 | 67.28 | 35.46 | 81.27 | 84.10 | 67.03 |
bce-reranker-base_v1 | 1.11 | 70.25 | 34.13 | 79.64 | 81.31 | 66.33 |
rag-retrieval-reranker | 0.41 | 67.33 | 31.57 | 83.54 | 86.03 | 67.12 |
Among them, rag-retrieval-reranker is what we use the RAG-Retrieval code to train on the hfl/chinese-roberta-wwm-ext model, and the training data uses the training data of the bge-rerank model.
Model | Model Size(GB) | Dim | T2Reranking | MMarcoReranking | CMedQAv1 | CMedQAv2 | Avg |
---|---|---|---|---|---|---|---|
bge-m3-colbert | 2.24 | 1024 | 66.82 | 26.71 | 75.88 | 76.83 | 61.56 |
rag-retrieval-colbert | 0.41 | 1024 | 66.85 | 31.46 | 81.05 | 84.22 | 65.90 |
Among them, rag-retrieval-colbert is what we use the RAG-Retrieval code to train on the hfl/chinese-roberta-wwm-ext model, and the training data uses the training data of the bge-rerank model.
Model | T2ranking | |
---|---|---|
bge-v1.5-embedding | 66.49 | |
bge-v1.5-embedding finetune | 67.15 | +0.66 |
bge-m3-colbert | 66.82 | |
bge-m3-colbert finetune | 67.22 | +0.40 |
bge-reranker-base | 67.28 | |
bge-reranker-base finetune | 67.57 | +0.29 |
The ones with finetune at the end mean that we use RAG-Retrieval to continue to fine-tune the results based on the corresponding open source model. The training data uses the T2-Reranking training set.
It is worth noting that the three open source models of bge already include T2-Reranking in the training set, and the data is relatively general, so the performance improvement effect of using this data to continue fine-tuning is not significant. However, if you use vertical domain data sets to continue to fine-tune the open source model, the performance improvement will be greater.
RAG-Retrieval is licensed under the MIT License.