RAG Retrieval Download - RAG Retrieval Source code download

RAG Retrieval

Other source code

Download

RAG-Retrieval

Reasoning about different RAG Reranker models in a unified way | Fine-tuning the full-link RAG reranker model | Experimental results | License

RAG-Retrieval provides full-link RAG retrieval fine-tuning (train), inference (infer) and distillation (distill) code.

For fine-tuning, it supports fine-tuning any open source RAG retrieval model , including vector (embedding, Figure a), late interactive model (colbert, Figure d), reordering model (cross encoder (bert), Figure c, llm-based reranker) .
For reasoning, RAG-Retrieval focuses on reranker and developed a lightweight python library rag-retrieval, which provides a unified way to call any different RAG sorting model .
For distillation, distillation of LLM-based reranker models into BERT-based reranker models is supported.

ColBERT

Community communication

Join our WeChat group chat

Latest updates

10/21/2024: RAG-Retrieval released two different methods for doing Reranker tasks based on LLM, as well as a method of distilling them into bert. What are the best practices for LLM on Reranker tasks? A simple experiment report (with code)
6/5/2024: MRL loss implementation of RAG-Retrieval’s Embedding model. RAG-Retrieval: Make MRL loss the standard configuration of training vector (embedding) models
6/2/2024: RAG-Retrieval implements supervised RAG retriever fine-tuning based on LLM preferences. RAG-Retrieval implements fine-tuning of RAG retriever based on LLM preference supervision
5/5/2024: Release of the lightweight python library RAG-Retrieval: Your RAG application deserves a better sorting inference framework
3/18/2024: Release RAG-Retrieval RAG-Retrieval Zhihu introduction

Roadmap

RAG-Retrieval Roadmap

Use a unified way to reason about different RAG Reranker models

Why do inference of Reranker model or even develop a package?

The ranking model is an important part of any retrieval architecture and an important part of RAG, but the current status quo is:

There are many open source ranking models. A model that performs well in scenario A may not perform well in scenario B. It is difficult to know which one to use.
In addition, new sorting models are constantly emerging. For example, LLM Reranker, released by BGE in March this year, uses a large decoder-only model to reorder paragraphs, which is very promising.
All different sorting models tend to develop their own libraries for sorting, which leads to higher barriers. New users need to be familiar with the input and output of each sorting model, as well as install various dependencies.

Characteristics of rag-retrieval

Therefore, RAG-Retrieval has developed a lightweight python library rag-retrieval, which provides a unified way to call any different RAG sorting model. It has the following characteristics.

Supports multiple sorting models: supports common open source sorting models (Cross Encoder Reranker, Decoder-Only LLM Reranker)
Long doc friendly: Supports two different processing logics for long docs (maximum length truncation, segmentation and maximum score).
Easy to expand: If there is a new ranking model, the user only needs to inherit basereranker and implement the rank and compute_score functions.

Installation environment

 #为了避免自动安装的torch与本地的cuda不兼容，建议进行下一步之前先手动安装本地cuda版本兼容的torch。
pip install rag-retrieval

Supported Reranker models

Cross Encoder Reranker

For Cross Encoder Reranker, as long as it uses AutoModelForSequenceClassification of transformers, you can use rag_retrieval's Reranker for reasoning. Examples are as follows.

Cross Encoder models of the bge series, such as (BAAI/bge-reranker-base, BAAI/bge-reranker-large, BAAI/bge-reranker-v2-m3)
Cross Encoder model of bce, for example (maidalun1020/bce-reranker-base_v1)

LLM Reranker

For LLM Reranker, rag_retrieval's Reranker supports a variety of powerful LLM ranking models. It also supports using any LLM chat model for zero shot sorting. Examples are as follows.

LLM Reranker models of the bge series, such as (BAAI/bge-reranker-v2-gemma, BAAI/bge-reranker-v2-minicpm-layerwise, BAAI/bge-reranker-v2-m3)
It also supports using any LLM chat model for zero shot sorting.

use

For detailed usage and precautions of the rag-retrieval package, please refer to Tutorial

We have done a lot of tests to align with the original reasoning framework below. See Tests for details. They require different modules to execute, and rag_retrieval uses a unified interface.

Such as FlagEmbedding's FlagReranker, FlagLLMReranker, LayerWiseFlagLLMReranker.

Such as BCEmbedding's RerankerModel

Fine-tuning the full-link RAG retrieval model

Installation environment

conda create -n rag-retrieval python=3.8 && conda activate rag-retrieval
#为了避免自动安装的torch与本地的cuda不兼容，建议进行下一步之前先手动安装本地cuda版本兼容的torch。
pip install -r requirements.txt

Vector (embedding) model

Support fine-tuning any open source embedding model (bge, m3e, etc.)
Supports fine-tuning of two types of data:
- query and positive examples (negative examples use random negative examples within the batch),
- query and positive examples and difficult negative examples. (The negative examples are the corresponding difficult negative examples and random negative examples within the batch)

Fine-tune the embedding model process. For detailed processes, please refer to the Tutorial in the model directory.

 cd ./rag_retrieval/train/embedding
bash train_embedding.sh

late interactive (colbert) model

Support fine-tuning colbert in the open source bge-m3e model.
Support query and positive examples as well as difficult negative examples. (The negative examples are the corresponding difficult negative examples and random negative examples within the batch)

Fine-tune the colbert model process. For detailed processes, please refer to the Tutorial in the model directory.

 cd ./rag_retrieval/train/colbert
bash train_colbert.sh

Ranking (reranker, cross encoder) model

Support fine-tuning any open source reranker model (for example, bge-rerank, bce-rerank, etc.).
Two types of data are supported for fine-tuning:
- The correlation between query and doc is divided into two categories (1 represents relevant, 0 represents irrelevant)
- The correlation between query and doc is score. (distillation task)

Fine-tune the reranker model process. For detailed processes, please refer to the Tutorial in the model directory.

 cd ./rag_retrieval/train/reranker
bash train_reranker.sh

Experimental results

Results of the reranker model on the MTEB Reranking task

Model	Model Size(GB)	T2Reranking	MMarcoReranking	CMedQAv1	CMedQAv2	Avg
bge-reranker-base	1.11	67.28	35.46	81.27	84.10	67.03
bce-reranker-base_v1	1.11	70.25	34.13	79.64	81.31	66.33
rag-retrieval-reranker	0.41	67.33	31.57	83.54	86.03	67.12

Among them, rag-retrieval-reranker is trained on the hfl/chinese-roberta-wwm-ext model using the RAG-Retrieval code, and the training data uses the training data of the bge-rerank model.

Results of the colbert model on the MTEB Reranking task

Model	Model Size(GB)	Dim	T2Reranking	MMarcoReranking	CMedQAv1	CMedQAv2	Avg
bge-m3-colbert	2.24	1024	66.82	26.71	75.88	76.83	61.56
rag-retrieval-colbert	0.41	1024	66.85	31.46	81.05	84.22	65.90

Among them, rag-retrieval-colbert is what we use the RAG-Retrieval code to train on the hfl/chinese-roberta-wwm-ext model, and the training data uses the training data of the bge-rerank model.

Fine-tune the open source BGE series models with data in the field

Model	T2ranking
bge-v1.5-embedding	66.49
bge-v1.5-embedding finetune	67.15	+0.66
bge-m3-colbert	66.82
bge-m3-colbert finetune	67.22	+0.40
bge-reranker-base	67.28
bge-reranker-base finetune	67.57	+0.29

The ones with finetune at the end mean that we use RAG-Retrieval to continue to fine-tune the results based on the corresponding open source model. The training data uses the T2-Reranking training set.

It is worth noting that the three open source models of bge already include T2-Reranking in the training set, and the data is relatively general, so the performance improvement effect of using this data to continue fine-tuning is not significant. However, if you use vertical domain data sets to continue to fine-tune the open source model, the performance improvement will be greater.