This project imitates the implementation of OpneAI Chat Completion API (GPT3.5 API) and provides streaming HTTP API for ChatGLM-6B.
This repository provides sample services under Flask and FastAPI and an out-of-the-box static Web UI without Node.js, npm or webpack.
The actual version is subject to the official ChatGLM-6B. But here’s a reminder:
stream_chat
method, the transformers package of 4.25.1 can no longer be used, so transformers==4.26.1.The following is the configuration of my development environment:
protobuf>=3.18,<3.20.1
transformers==4.26.1
torch==1.12.1+cu113
torchvision==0.13.1
icetk
cpm_kernels
non-streaming interface
Interface URL: http://{host_name}/chat
Request method: POST(JSON body)
Request parameters:
Field name | type | illustrate |
---|---|---|
query | string | User issues |
history | array[string] | session history |
Return results:
Field name | type | illustrate |
---|---|---|
query | string | User issues |
response | string | full reply |
history | array[string] | session history |
Streaming interface , using server-sent events technology.
Interface URL: http://{host_name}/stream
Request method: POST(JSON body)
Return method:
delta
Return results:
Field name | type | illustrate |
---|---|---|
delta | string | generated characters |
query | string | User problem, to save flow, returned when finished is true |
response | string | The reply so far, when finished is true , is a complete reply |
history | array[string] | Session history, to save flow, returned when finished is true |
finished | boolean | true indicates the end, false indicates there is still data flow. |
Clean up the memory interface
http://{host_name}/clear
To implement server-sent events under Flask, you could have used Flask-SSE
. However, since this package relies on Redis, it is not necessary in this scenario. Therefore, I referred to this document and implemented server-sent events in the simplest way. .
Flask
Flask-Cors
gevent
python3 -u chatglm_service_flask.py --host 127.0.0.1 --port 8800 --quantize 8 --device 0
Among the parameters, -1 for --device represents cpu, and other numbers i
represent the i
th card.
FastAPI can use sse-starlette
to create and send server-sent events.
Note that the event stream output by sse-starlette may contain redundant symbols, which needs to be paid attention to when processing by the front end .
fastApi
sse-starlette
unicorn
python3 -u chatglm_service_fastapi.py --host 127.0.0.1 --port 8800 --quantize 8 --device 0
This repository provides a demo page for calling the streaming API, which can be used out of the box in the intranet environment without the need for Nodejs, npm or webpack.
Based on my limited development environment constraints and poor technical reserves, the HTML Demo was developed using bootstrap.js 3.x + Vue.js 2.x and rendered using marked.js+highlight.js. For the specific version, please refer to the CDN link in HTML .
Since the browser's own EventSource implementation does not support the POST method, setting the request header and other capabilities requires a third-party implementation instead.
If you use NPM in serious development, you can use @microsoft/fetch-event-source;
However, due to my limited conditions, I was too lazy to compile TypeScript, so I used @rangermauve/fetch-event-source. However, this project only has the most basic functions of EventSource, so magic modifications were made on this basis.
However, after using this method, EventStream cannot be displayed correctly in Chrome's DevTools.
Modify the port corresponding to line 1 in static/js/chatglm.js
:
var baseUrl = "http://localhost:8800/"
Garbled characters may appear during streaming Q&A generation
This is mainly caused by the fact that some tokens are not complete characters, so the interface finally returns the complete generation result. If it is a non-zero Unicode character, it can be processed through encoding judgment; however, there are still some other garbled characters, and I will further investigate.
In addition to the repositories cited above, the following projects also need to be thanked:
ikechan8370/SimpleChatGLM6BAPI refers to the parameter logic