ChatGLM web stream demo Download - ChatGLM web stream demo Source code download

ChatGLM web stream demo

AI Source Code

1.0.0

Download

ChatGLM-6B Streaming HTTP API Example

This project imitates the implementation of OpneAI Chat Completion API (GPT3.5 API) and provides streaming HTTP API for ChatGLM-6B.

This repository provides sample services under Flask and FastAPI and an out-of-the-box static Web UI without Node.js, npm or webpack.

Requirements

The actual version is subject to the official ChatGLM-6B. But here’s a reminder:

After the official update of stream_chat method, the transformers package of 4.25.1 can no longer be used, so transformers==4.26.1.
cpm_kernel requires CUDA to be installed locally (if CUDA in torch uses an integrated method, such as various one-click installation packages, you need to pay attention to this.)
For better performance, it is recommended to use CUDA11.6 or 11.7 with PyTorch 1.13 and torchvision 0.14.1.

The following is the configuration of my development environment:

 protobuf>=3.18,<3.20.1
transformers==4.26.1
torch==1.12.1+cu113
torchvision==0.13.1
icetk
cpm_kernels

interface

non-streaming interface

Interface URL: http://{host_name}/chat
Request method: POST(JSON body)
Request parameters:
Field name type illustrate
query string User issues
history array[string] session history
Return results:
Field name type illustrate
query string User issues
response string full reply
history array[string] session history

Field name	type	illustrate
query	string	User issues
history	array[string]	session history

Field name	type	illustrate
query	string	User issues
response	string	full reply
history	array[string]	session history

Streaming interface , using server-sent events technology.

Interface URL: http://{host_name}/stream
Request method: POST(JSON body)
Return method:
- Use the Event Stream format to return the server-side event stream.
- Event name: delta
- Data type: JSON

Return results:

Field name	type	illustrate
delta	string	generated characters
query	string	User problem, to save flow, returned when finished is `true`
response	string	The reply so far, when finished is `true` , is a complete reply
history	array[string]	Session history, to save flow, returned when finished is `true`
finished	boolean	`true` indicates the end, `false` indicates there is still data flow.

Clean up the memory interface

Interface URL: http://{host_name}/clear
Request method: GET

Flask Demo

To implement server-sent events under Flask, you could have used Flask-SSE . However, since this package relies on Redis, it is not necessary in this scenario. Therefore, I referred to this document and implemented server-sent events in the simplest way. .

Requirements

 Flask
Flask-Cors
gevent

start up

python3 -u chatglm_service_flask.py --host 127.0.0.1 --port 8800 --quantize 8 --device 0

Among the parameters, -1 for --device represents cpu, and other numbers i represent the i th card.

FastAPI Demo

FastAPI can use sse-starlette to create and send server-sent events.

Note that the event stream output by sse-starlette may contain redundant symbols, which needs to be paid attention to when processing by the front end .

Requirements

 fastApi
sse-starlette
unicorn

start up

python3 -u chatglm_service_fastapi.py --host 127.0.0.1 --port 8800 --quantize 8 --device 0

Web UI Demo

This repository provides a demo page for calling the streaming API, which can be used out of the box in the intranet environment without the need for Nodejs, npm or webpack.

Based on my limited development environment constraints and poor technical reserves, the HTML Demo was developed using bootstrap.js 3.x + Vue.js 2.x and rendered using marked.js+highlight.js. For the specific version, please refer to the CDN link in HTML .

About EventSource

Since the browser's own EventSource implementation does not support the POST method, setting the request header and other capabilities requires a third-party implementation instead.

If you use NPM in serious development, you can use @microsoft/fetch-event-source;

However, due to my limited conditions, I was too lazy to compile TypeScript, so I used @rangermauve/fetch-event-source. However, this project only has the most basic functions of EventSource, so magic modifications were made on this basis.

However, after using this method, EventStream cannot be displayed correctly in Chrome's DevTools.

Change port

Modify the port corresponding to line 1 in static/js/chatglm.js :

 var baseUrl = "http://localhost:8800/"

Known Issues

Garbled characters may appear during streaming Q&A generation
This is mainly caused by the fact that some tokens are not complete characters, so the interface finally returns the complete generation result. If it is a non-zero Unicode character, it can be processed through encoding judgment; however, there are still some other garbled characters, and I will further investigate.