English | 简体中文
HuixiangDou is a professional knowledge assistant based on LLM.
Advantages:
chat_in_group
copes with group chat scenario, answer user questions without message flooding, see 2401.08772, 2405.02817, Hybrid Retrieval and Precision Reportchat_with_repo
for real-time streaming chatCheck out the scenes in which HuixiangDou are running and join WeChat Group to try AI assistant inside.
If this helps you, please give it a star
Our Web version has been released to OpenXLab, where you can create knowledge base, update positive and negative examples, turn on web search, test chat, and integrate into Feishu/WeChat groups. See BiliBili and YouTube !
The Web version's API for Android also supports other devices. See Python sample code.
langchain
?? | LoRA-Qwen1.5-14B | LoRA-Qwen1.5-32B | alpaca data | arXiv |
LLM | File Format | Retrieval Method | Integration | Preprocessing |
|
|
|
|
|
The following are the GPU memory requirements for different features, the difference lies only in whether the options are turned on.
Configuration Example | GPU mem Requirements | Description | Verified on Linux |
---|---|---|---|
config-cpu.ini | - | Use siliconcloud API for text only |
|
config-2G.ini | 2GB | Use openai API (such as kimi, deepseek and stepfun to search for text only | |
config-multimodal.ini | 10GB | Use openai API for LLM, image and text retrieval | |
[Standard Edition] config.ini | 19GB | Local deployment of LLM, single modality | |
config-advanced.ini | 80GB | local LLM, anaphora resolution, single modality, practical for WeChat group |
We take the standard edition (local running LLM, text retrieval) as an introduction example. Other versions are just different in configuration options.
Click to agree to the BCE model agreement, log in huggingface
huggingface-cli login
Install dependencies
# parsing `word` format requirements
apt update
apt install python-dev libxml2-dev libxslt1-dev antiword unrtf poppler-utils pstotext tesseract-ocr flac ffmpeg lame libmad0 libsox-fmt-mp3 sox libjpeg-dev swig libpulse-dev
# python requirements
pip install -r requirements.txt
# For python3.8, install faiss-gpu instead of faiss
Use mmpose documents to build the mmpose knowledge base and filtering questions. If you have your own documents, just put them under repodir
.
Copy and execute all the following commands (including the '#' symbol).
# Download the knowledge base, we only take the documents of mmpose as an example. You can put any of your own documents under `repodir`
cd HuixiangDou
mkdir repodir
git clone https://github.com/open-mmlab/mmpose --depth=1 repodir/mmpose
# Save the features of repodir to workdir, and update the positive and negative example thresholds into `config.ini`
mkdir workdir
python3 -m huixiangdou.service.feature_store
After running, test with python3 -m huixiangdou.main --standalone
. At this time, reply to mmpose related questions (related to the knowledge base), while not responding to weather questions.
python3 -m huixiangdou.main --standalone
+---------------------------+---------+----------------------------+-----------------+
| Query | State | Reply | References |
+===========================+=========+============================+=================+
| How to install mmpose? | success | To install mmpose, plea.. | installation.md |
--------------------------------------------------------------------------------------
| How is the weather today? | unrelated.. | .. | |
+-----------------------+---------+--------------------------------+-----------------+
? Input your question here, type `bye` for exit:
..
Note
Also run a simple Web UI with gradio
:
python3 -m huixiangdou.gradio_ui
Or run a server to listen 23333, default pipeline is chat_with_repo
:
python3 -m huixiangdou.server
# test async API
curl -X POST http://127.0.0.1:23333/huixiangdou_stream -H "Content-Type: application/json" -d '{"text": "how to install mmpose","image": ""}'
# cURL sync API
curl -X POST http://127.0.0.1:23333/huixiangdou_inference -H "Content-Type: application/json" -d '{"text": "how to install mmpose","image": ""}'
Please update the repodir
documents, good_questions and bad_questions, and try your own domain knowledge (medical, financial, power, etc.).
We provide typescript
front-end and python
back-end source code:
Same as OpenXlab APP, please read the web deployment document.
If there is no GPU available, model inference can be completed using the siliconcloud API.
Taking docker miniconda+Python3.11 as an example, install CPU dependencies and run:
# Start container
docker run -v /path/to/huixiangdou:/huixiangdou -p 7860:7860 -p 23333:23333 -it continuumio/miniconda3 /bin/bash
# Install dependencies
apt update
apt install python-dev libxml2-dev libxslt1-dev antiword unrtf poppler-utils pstotext tesseract-ocr flac ffmpeg lame libmad0 libsox-fmt-mp3 sox libjpeg-dev swig libpulse-dev
python3 -m pip install -r requirements-cpu.txt
# Establish knowledge base
python3 -m huixiangdou.service.feature_store --config_path config-cpu.ini
# Q&A test
python3 -m huixiangdou.main --standalone --config_path config-cpu.ini
# gradio UI
python3 -m huixiangdou.gradio_ui --config_path config-cpu.ini
If you find the installation too slow, a pre-installed image is provided in Docker Hub. Simply replace it when starting the docker.
If your GPU mem exceeds 1.8G, or you pursue cost-effectiveness. This configuration discards the local LLM and uses remote LLM instead, which is the same as the standard edition.
Take siliconcloud
as an example, fill in the API TOKEN applied from the official website into config-2G.ini
# config-2G.ini
[llm]
enable_local = 0 # Turn off local LLM
enable_remote = 1 # Only use remote
..
remote_type = "siliconcloud" # Choose siliconcloud
remote_api_key = "YOUR-API-KEY-HERE" # Your API key
remote_llm_model = "alibaba/Qwen1.5-110B-Chat"
Note
Execute the following to get the Q&A results
python3 -m huixiangdou.main --standalone --config-path config-2G.ini # Start all services at once
If you have 10G GPU mem, you can further support image and text retrieval. Just modify the model used in config.ini.
# config-multimodal.ini
# !!! Download `https://huggingface.co/BAAI/bge-visualized/blob/main/Visualized_m3.pth` to `bge-m3` folder !!!
embedding_model_path = "BAAI/bge-m3"
reranker_model_path = "BAAI/bge-reranker-v2-minicpm-layerwise"
Note:
bpe_simple_vocab_16e6.txt.gz
Run gradio to test, see the image and text retrieval result here.
python3 tests/test_query_gradio.py
The "HuiXiangDou" in the WeChat experience group has enabled all features:
Please read the following topics:
Contributors have provided Android tools to interact with WeChat. The solution is based on system-level APIs, and in principle, it can control any UI (not limited to communication software).
What if the robot is too cold/too chatty?
resource/good_questions.json
, and fill the ones that should be rejected into resource/bad_questions.json
.repodir
to ensure that the markdown documents in the main library do not contain irrelevant content.Re-run feature_store
to update thresholds and feature libraries.
reject_throttle
in config.ini. Generally speaking, 0.5 is a high value; 0.2 is too low.
Launch is normal, but out of memory during runtime?
LLM long text based on transformers structure requires more memory. At this time, kv cache quantization needs to be done on the model, such as lmdeploy quantization description. Then use docker to independently deploy Hybrid LLM Service.
How to access other local LLM / After access, the effect is not ideal?
What if the response is too slow/request always fails?
What if the GPU memory is too low?
At this time, it is impossible to run local LLM, and only remote LLM can be used in conjunction with text2vec to execute the pipeline. Please make sure that config.ini
only uses remote LLM and turn off local LLM.
@misc{kong2024huixiangdou,
title={HuiXiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance},
author={Huanjun Kong and Songyang Zhang and Jiaying Li and Min Xiao and Jun Xu and Kai Chen},
year={2024},
eprint={2401.08772},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@misc{kong2024labelingsupervisedfinetuningdata,
title={Labeling supervised fine-tuning data with the scaling law},
author={Huanjun Kong},
year={2024},
eprint={2405.02817},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2405.02817},
}