HuixiangDou Download - HuixiangDou Source code download

英语 | 简体中文

HuiyangDou是一位基于LLM的专业知识助理。

优点：

设计预处理、拒绝和响应三阶段管道
- chat_in_group应对群聊场景，回答用户问题，不会出现消息泛滥，参见2401.08772、2405.02817，混合检索和精度报告
- chat_with_repo用于实时流式聊天
无需培训，仅 CPU、2G、10G、20G 和 80G 配置
提供完整的 Web、Android 和管道源代码套件，工业级且商业可行

看看回想豆运行的场景，加入微信群试试里面的AI助手。

如果对您有帮助，请给个star

？新功能

我们的Web版已经发布到OpenXLab，您可以在其中创建知识库、更新正反例、打开网页搜索、测试聊天、加入飞书/微信群。请参阅 BiliBili 和 YouTube！

Android 的网页版 API 也支持其他设备。请参阅 Python 示例代码。

[2024/09] 倒排索引器让LLM更喜欢知识库
[2024/09] 代码检索
[2024/08] chat_with_readthedocs，看看如何集成？
[2024/07] 图文检索&去除langchain ?
[2024/07] 混合知识图和密集检索 F1 分数提高 1.7%
[2024/06] chunksize、splitter 和 text2vec 模型的评估
[2024/05] wkteam微信接入，解析图片&URL，支持共指解析
[2024/05] SFT LLM on NLP任务，F1提升29%
？ LoRA-Qwen1.5-14B LoRA-Qwen1.5-32B 羊驼数据 arXiv
[2024/04] RAG标注SFT问答数据及示例
[2024/04] 发布Web前后端服务源代码？
[2024/03] 新的个人微信集成和预建APK ！
[2024/02] [实验特性]微信群融合多模态实现OCR

支持状态

法学硕士	文件格式	检索方式	一体化	预处理
实习生LM2/实习生LM2.5 奇文1.5~2.5 璞玉步趣基米深度搜索 GLM（智浦）硅云 Xi-Api	pdf 单词卓越 PPT html 降价 TXT	文档密集代码稀疏知识图谱互联网搜索源图图片和文字	微信(android/wkteam) 云雀 OpenXLab 网络调音台演示 HTTP服务器阅读文档	共指消解

？硬件要求

以下是不同功能对GPU显存的要求，区别仅在于选项是否开启。

配置示例	GPU 内存要求	描述
配置-cpu.ini	-	使用siliconcloud API 仅适用于文本
配置-2G.ini	2GB	使用openai API（例如kimi、deepseek和stepfun）仅搜索文本
配置-multimodal.ini	10GB	使用 openai API 进行法学硕士、图像和文本检索
【标准版】config.ini	19GB	法学硕士本地部署，单一模式
配置高级.ini	80GB	本地法学硕士，照应解析，单一模态，适用于微信群

运行标准版

我们以标准版（本地运行LLM、文本检索）为例进行介绍。其他版本只是配置选项不同。

一、下载并安装依赖

点击同意BCE模型协议，登录huggingface

huggingface-cli login

安装依赖项

 # parsing `word` format requirements
apt update
apt install python-dev libxml2-dev libxslt1-dev antiword unrtf poppler-utils pstotext tesseract-ocr flac ffmpeg lame libmad0 libsox-fmt-mp3 sox libjpeg-dev swig libpulse-dev
# python requirements
pip install -r requirements.txt
# For python3.8, install faiss-gpu instead of faiss

二.创建知识库并提出问题

使用mmpose文档构建mmpose知识库并过滤问题。如果您有自己的文档，只需将它们放在repodir下即可。

复制并执行以下所有命令（包括“#”符号）。

 # Download the knowledge base, we only take the documents of mmpose as an example. You can put any of your own documents under `repodir`
cd HuixiangDou
mkdir repodir
git clone https://github.com/open-mmlab/mmpose    --depth=1 repodir/mmpose

# Save the features of repodir to workdir, and update the positive and negative example thresholds into `config.ini`
mkdir workdir
python3 -m huixiangdou.service.feature_store

运行后，使用python3 -m huixiangdou.main --standalone进行测试。此时，回复mmpose相关问题（与知识库相关），同时不回复天气问题。

python3 -m huixiangdou.main --standalone

+---------------------------+---------+----------------------------+-----------------+
|         Query             |  State  |         Reply              |   References    |
+===========================+=========+============================+=================+
| How to install mmpose ?    | success | To install mmpose, plea..  | installation.md |
--------------------------------------------------------------------------------------
| How is the weather today ? | unrelated.. | ..                     |                 |
+-----------------------+---------+--------------------------------+-----------------+
? Input your question here, type ` bye ` for exit:
..

笔记

如果每次重启LLM太慢，可以先python3 -m huifangdou.service.llm_server_hybrid ；然后打开一个新窗口，每次只执行python3 -m huiyangdou.main而不重新启动LLM。

还可以使用gradio运行一个简单的 Web UI：

python3 -m huixiangdou.gradio_ui

输出.mp4

或者运行一个服务器来监听 23333，默认管道是chat_with_repo ：

python3 -m huixiangdou.server

# test async API 
curl -X POST http://127.0.0.1:23333/huixiangdou_stream  -H " Content-Type: application/json " -d ' {"text": "how to install mmpose","image": ""} '
# cURL sync API
curl -X POST http://127.0.0.1:23333/huixiangdou_inference  -H " Content-Type: application/json " -d ' {"text": "how to install mmpose","image": ""} '

请更新repodir文档，good_questions和bad_questions，并尝试你自己的领域知识（医疗，金融，电力等）。

三．融入飞书、微信群

单向发送至飞书群
双向飞书群收发、召回
个人微信Android接入
个人微信wkteam访问

四．部署Web前端和后端

我们提供typescript前端和python后端源代码：

支持多租户管理
零编程接入飞书、微信
k8s 友好

与OpenXlab APP相同，请阅读Web部署文档。

？其他配置

仅CPU版

如果没有可用的GPU，可以使用siliconcloud API完成模型推理。

以docker miniconda+Python3.11为例，安装CPU依赖并运行：

 # Start container
docker run -v /path/to/huixiangdou:/huixiangdou -p 7860:7860 -p 23333:23333 -it continuumio/miniconda3 /bin/bash
# Install dependencies
apt update
apt install python-dev libxml2-dev libxslt1-dev antiword unrtf poppler-utils pstotext tesseract-ocr flac ffmpeg lame libmad0 libsox-fmt-mp3 sox libjpeg-dev swig libpulse-dev
python3 -m pip install -r requirements-cpu.txt
# Establish knowledge base
python3 -m huixiangdou.service.feature_store --config_path config-cpu.ini
# Q&A test
python3 -m huixiangdou.main --standalone --config_path config-cpu.ini
# gradio UI
python3 -m huixiangdou.gradio_ui --config_path config-cpu.ini

如果您发现安装太慢，Docker Hub 中提供了预安装的镜像。只需在启动 docker 时替换它即可。

2G性价比版

如果你的GPU mem超过1.8G，或者你追求性价比。此配置放弃本地LLM，改用远程LLM，与标准版相同。

以siliconcloud为例，将官网申请的API TOKEN填写到config-2G.ini中

 # config-2G.ini
[ llm ]
enable_local = 0   # Turn off local LLM
enable_remote = 1  # Only use remote
..
remote_type = " siliconcloud "   # Choose siliconcloud
remote_api_key = " YOUR-API-KEY-HERE " # Your API key
remote_llm_model = " alibaba/Qwen1.5-110B-Chat "

笔记

每个问答场景最差需要调用LLM 7次，受免费用户RPM限制，可以修改config.ini中的rpm参数

执行以下命令获取Q&A结果

python3 -m huixiangdou.main --standalone --config-path config-2G.ini # Start all services at once

10G多模版

如果你有10G GPU内存，你可以进一步支持图像和文本检索。只需修改config.ini中使用的模型即可。

 # config-multimodal.ini
# !!! Download `https://huggingface.co/BAAI/bge-visualized/blob/main/Visualized_m3.pth`    to `bge-m3` folder !!!
embedding_model_path = " BAAI/bge-m3 "
reranker_model_path = " BAAI/bge-reranker-v2-minicpm-layerwise "

笔记：

需要手动下载Visualized_m3.pth到bge-m3目录下
在主分支上安装 FlagEmbedding，我们已经修复了错误。在这里您可以下载bpe_simple_vocab_16e6.txt.gz
安装要求/multimodal.txt

运行gradio进行测试，查看图文检索结果。

python3 tests/test_query_gradio.py

80G完整版

微信体验群中的“灰香豆”已启用全部功能：

Serper 搜索和 SourceGraph 搜索增强
群聊图片、微信公众号解析
文本共指解析
混合法学硕士
知识库与openmmlab的12个存储库（1700个文档）相关，拒绝闲聊

请阅读以下主题：

混合知识图谱和密集检索
参考config-advanced.ini配置提高效果
群聊场景照应解析训练
使用wkteam微信接入，整合图像、公众号解析、照应解析
使用rag.py注释SFT训练数据

安卓工具

贡献者提供了与微信交互的Android工具。该解决方案基于系统级API，原则上可以控制任何UI（不限于通信软件）。

常问问题

如果机器人太冷/太健谈怎么办？
- 将真实场景中应该回答的问题填写到resource/good_questions.json中，将应该拒绝的问题填写到resource/bad_questions.json中。
- 调整repodir中的主题内容，保证主库中的markdown文档不包含不相关的内容。
重新运行feature_store以更新阈值和特征库。
️可以直接修改config.ini中的reject_throttle 。一般来说，0.5是一个较高的值； 0.2 太低了。
启动正常，但是运行时内存不足？
基于 Transformers 结构的 LLM 长文本需要更多内存。这时候需要在模型上做kv缓存量化，比如lmdeploy量化描述。然后使用docker独立部署Hybrid LLM Service。
如何访问其他本地LLM / 访问后效果不理想？
- 开放混合llm服务，添加新的LLM推理实现。
- 参考test_intention_prompt和测试数据，调整新模型的prompt和threshold，更新到prompt.py中。
如果响应太慢/请求总是失败怎么办？
- 参考混合llm服务添加指数退避和重传。
- 使用 lmdeploy 等推理框架替换本地 LLM，而不是本机 Huggingface/transformers。
如果 GPU 显存太低怎么办？
此时无法运行本地LLM，只能使用远程LLM与text2vec结合来执行管道。请确保config.ini仅使用远程LLM并关闭本地LLM。

？致谢

KIMI：长文本LLM，支持直接文件上传
标志嵌入：BAAI RAG 组
BCEmbedding：中英双语特征模型
Langchain-ChatChat：Langchain和ChatGLM的应用
GrabRedEnvelope：微信抢红包

引文

@misc{kong2024huixiangdou,
      title={HuiXiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance},
      author={Huanjun Kong and Songyang Zhang and Jiaying Li and Min Xiao and Jun Xu and Kai Chen},
      year={2024},
      eprint={2401.08772},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@misc{kong2024labelingsupervisedfinetuningdata,
      title={Labeling supervised fine-tuning data with the scaling law}, 
      author={Huanjun Kong},
      year={2024},
      eprint={2405.02817},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2405.02817}, 
}

展开