php openai gpt stream chat api webui Download - php openai gpt stream chat api webui Source code download

php-openai-gpt-stream-chat-api-webui

Pure PHP open source from @qiayue implements GPT streaming calls and front-end real-time printing webui.

Updated April 13:

1. The speed has been slow recently because OpenAI has limited speed for free accounts. The one who binds a credit card at platform.openai.com is the normal speed before;

2. Speed limit means that when streaming requests, it takes about 20 seconds to return the first token, and the account bound to the credit card is about 2 seconds;

Directory structure

 /
├─ /class
│  ├─ Class.ChatGPT.php
│  ├─ Class.DFA.php
│  ├─ Class.StreamHandler.php
├─ /static
│  ├─ css
│  │  ├─ chat.css
│  │  ├─ monokai-sublime.css
│  ├─ js
│  │  ├─ chat.js
│  │  ├─ highlight.min.js
│  │  ├─ marked.min.js
├─ /chat.php
├─ /index.html
├─ /README.md
├─ /sensitive_words.txt

Directory/file	illustrate
/	Program root directory
/class	php file directory
/class/Class.ChatGPT.php	ChatGPT class, used to process front-end requests and submit requests to the OpenAI interface
/class/Class.DFA.php	DFA class for sensitive word checksum replacement
/class/Class.StreamHandler.php	StreamHandler class, used to process data returned by OpenAI in real time
/static	Store all the static files required for front-end pages
/static/css	Store all css files on the front-end page
/static/css/chat.css	Front-end page chat style file
/static/css/monokai-sublime.css	highlight the theme style file for the plugin highlighting code
/static/js	Store all js files on the front-end page
/static/js/chat.js	Front-end chat interaction js code
/static/js/highlight.min.js	Code highlighting js library
/static/js/marked.min.js	markdown parsing js library
/chat.php	The backend entry file for front-end chat requests, where the php class file is introduced
/index.html	Front-end page html code
/README.md	Warehouse description file
/sensitive_words.txt	Sensitive word file, each line, you need to collect sensitive words yourself, you can also add me on WeChat (same as GitHub id) to find me.

How to use

The code of this project does not use any framework, nor does it introduce any third-party backend libraries. The front-end introduces the code highlight library highlight and markdown parsing library marked. Both have been downloaded in the project, so you can use it directly without any installation after getting the code.

The only two things to do is fill in your own api key.

After obtaining the source code, modify chat.php , fill in the OpenAI api key and go in. For details, please see:

 $ chat = new ChatGPT ([
    ' api_key ' => '此处需要填入 openai 的 api key ' ,
]);

If the sensitive word detection function is enabled, you need to put the sensitive word line into the sensitive_words_sdfdsfvdfs5v56v5dfvdf.txt file.

I opened a WeChat group and welcome to join the group to communicate:

Principle Description

Streaming Receives OpenAI Return Data

In the backend Class.ChatGPT.php, use curl to initiate a request to OpenAI, use curl's CURLOPT_WRITEFUNCTION to set the callback function, and at the same time, 'stream' => true in the request parameter tells OpenAI to enable streaming.

We use curl_setopt($ch, CURLOPT_WRITEFUNCTION, [$this->streamHandler, 'callback']); to process the data returned by OpenAI through curl_setopt($ch, CURLOPT_WRITEFUNCTION $this->streamHandler ' callback ']);

OpenAI will return data: {"id":"","object":"","created":1679616251,"model":"","choices":[{"delta":{"content":""},"index":0,"finish_reason":null}]} format string, and the answer we need is in choices[0]['delta']['content'] . Of course, we also need to make exception judgments and cannot directly obtain data like this.

In addition, due to network transmission problems, the data received by callback function each time does not necessarily have only one data: {"key":"value"} , which may only have half a piece, or multiple pieces, or N and a half pieces.

So we added data_buffer attribute to StreamHandler class to store half of the data that cannot be parsed.

Here, some special processing is done based on OpenAI's return data format, the specific code is as follows:

 public function callback ( $ ch , $ data ) {
        $ this -> counter += 1 ;
        file_put_contents ( ' ./log/data. ' . $ this -> qmd5 . ' .log ' , $ this -> counter . ' == ' . $ data . PHP_EOL . ' -------------------- ' . PHP_EOL , FILE_APPEND );

        $ result = json_decode ( $ data , TRUE );
        if ( is_array ( $ result )){
        	$ this -> end ( ' openai 请求错误： ' . json_encode ( $ result ));
        	return strlen ( $ data );
        }

        /*
            此处步骤仅针对 openai 接口而言
            每次触发回调函数时，里边会有多条data数据，需要分割
            如某次收到 $data 如下所示：
            data: {"id":"chatcmpl-6wimHHBt4hKFHEpFnNT2ryUeuRRJC","object":"chat.completion.chunk","created":1679453169,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"role":"assistant"},"index":0,"finish_reason":null}]}nndata: {"id":"chatcmpl-6wimHHBt4hKFHEpFnNT2ryUeuRRJC","object":"chat.completion.chunk","created":1679453169,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":"以下"},"index":0,"finish_reason":null}]}nndata: {"id":"chatcmpl-6wimHHBt4hKFHEpFnNT2ryUeuRRJC","object":"chat.completion.chunk","created":1679453169,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":"是"},"index":0,"finish_reason":null}]}nndata: {"id":"chatcmpl-6wimHHBt4hKFHEpFnNT2ryUeuRRJC","object":"chat.completion.chunk","created":1679453169,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":"使用"},"index":0,"finish_reason":null}]}

            最后两条一般是这样的：
            data: {"id":"chatcmpl-6wimHHBt4hKFHEpFnNT2ryUeuRRJC","object":"chat.completion.chunk","created":1679453169,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{},"index":0,"finish_reason":"stop"}]}nndata: [DONE]

            根据以上 openai 的数据格式，分割步骤如下：
        */

        // 0、把上次缓冲区内数据拼接上本次的data
        $ buffer = $ this -> data_buffer . $ data ;

        //拼接完之后，要把缓冲字符串清空
        $ this -> data_buffer = '' ;

        // 1、把所有的 'data: {' 替换为 '{' ，'data: [' 换成 '['
        $ buffer = str_replace ( ' data: { ' , ' { ' , $ buffer );
        $ buffer = str_replace ( ' data: [ ' , ' [ ' , $ buffer );

        // 2、把所有的 '}nn{' 替换维 '}[br]{' ， '}nn[' 替换为 '}[br]['
        $ buffer = str_replace ( ' } ' . PHP_EOL . PHP_EOL . ' { ' , ' }[br]{ ' , $ buffer );
        $ buffer = str_replace ( ' } ' . PHP_EOL . PHP_EOL . ' [ ' , ' }[br][ ' , $ buffer );

        // 3、用 '[br]' 分割成多行数组
        $ lines = explode ( ' [br] ' , $ buffer );

        // 4、循环处理每一行，对于最后一行需要判断是否是完整的json
        $ line_c = count ( $ lines );
        foreach ( $ lines as $ li => $ line ){
            if ( trim ( $ line ) == ' [DONE] ' ){
                //数据传输结束
                $ this -> data_buffer = '' ;
                $ this -> counter = 0 ;
                $ this -> sensitive_check ();
                $ this -> end ();
                break ;
            }
            $ line_data = json_decode ( trim ( $ line ), TRUE );
            if ( ! is_array ( $ line_data ) || ! isset ( $ line_data [ ' choices ' ]) || ! isset ( $ line_data [ ' choices ' ][ 0 ]) ){
                if ( $ li == ( $ line_c - 1 )){
                    //如果是最后一行
                    $ this -> data_buffer = $ line ;
                    break ;
                }
                //如果是中间行无法json解析，则写入错误日志中
                file_put_contents ( ' ./log/error. ' . $ this -> qmd5 . ' .log ' , json_encode ([ ' i ' => $ this -> counter , ' line ' => $ line , ' li ' => $ li ], JSON_UNESCAPED_UNICODE | JSON_PRETTY_PRINT ). PHP_EOL . PHP_EOL , FILE_APPEND );
                continue ;
            }

            if ( isset ( $ line_data [ ' choices ' ][ 0 ][ ' delta ' ]) && isset ( $ line_data [ ' choices ' ][ 0 ][ ' delta ' ][ ' content ' ]) ){
            	$ this -> sensitive_check ( $ line_data [ ' choices ' ][ 0 ][ ' delta ' ][ ' content ' ]);
            }
        }

        return strlen ( $ data );
    }

Sensitive word detection

We used the DFA algorithm to implement sensitive word detection. According to ChatGPT, "DFA"是指“确定性有限自动机”（Deterministic Finite Automaton） , DfaFilter（确定有限自动机过滤器）通常是指一种用于文本处理和匹配的算法.

The Class.DFA.php class code is written in GPT4, and the specific implementation code is shown in the source code.

Here is a description of how to use it. To create a DFA instance, you need to pass in the sensitive word file path:

 $ dfa = new DFA ([
    ' words_file ' => ' ./sensitive_words_sdfdsfvdfs5v56v5dfvdf.txt ' ,
]);

Special note: The file name of the garbled string is specially used here to prevent others from downloading sensitive word files. Please change another garbled file name yourself after deployment. Do not use the file name I have published here.

After that, you can use $dfa->containsSensitiveWords($inputText) to determine whether $inputText contains sensitive words. The return value is a boolean value of TRUE or FALSE . You can also use $outputText = $dfa->replaceWords($inputText) to replace sensitive words. All sensitive words specified in sensitive_words.txt will be replaced with three * signs.

If you don't want to enable sensitive word detection, comment out the following three sentences in chat.php :

 $ dfa = new DFA ([
    ' words_file ' => ' ./sensitive_words_sdfdsfvdfs5v56v5dfvdf.txt ' ,
]);
$ chat -> set_dfa ( $ dfa );

If sensitive word detection is not enabled, each return of OpenAI will be returned to the front end in real time.

If sensitive word detection is enabled, the newline and pause symbols ['，', '。', '；', '？', '！', '……'] etc. to perform sentence segmentation. Each sentence uses $outputText = $dfa->replaceWords($inputText) to replace sensitive words, and then the whole sentence is returned to the front end.

After turning on sensitive words, it takes time to load the sensitive word file. Each time it is detected, it is also detected sentence by sentence, rather than word by word, which will also cause the return to slow down.

Therefore, if it is for your own use, you can not enable sensitive word detection. If it is for deployment for others, in order to protect the security of your domain name and your security, it is best to enable sensitive word detection.

Streaming back to the front end

Just looking at the comments in chat.php will make it clearer:

 /*
以下几行注释由 GPT4 生成
*/

// 这行代码用于关闭输出缓冲。关闭后，脚本的输出将立即发送到浏览器，而不是等待缓冲区填满或脚本执行完毕。
ini_set ( ' output_buffering ' , ' off ' );

// 这行代码禁用了 zlib 压缩。通常情况下，启用 zlib 压缩可以减小发送到浏览器的数据量，但对于服务器发送事件来说，实时性更重要，因此需要禁用压缩。
ini_set ( ' zlib.output_compression ' , false );

// 这行代码使用循环来清空所有当前激活的输出缓冲区。ob_end_flush() 函数会刷新并关闭最内层的输出缓冲区，@ 符号用于抑制可能出现的错误或警告。
while (@ ob_end_flush ()) {}

// 这行代码设置 HTTP 响应的 Content-Type 为 text/event-stream，这是服务器发送事件（SSE）的 MIME 类型。
header ( ' Content-Type: text/event-stream ' );

// 这行代码设置 HTTP 响应的 Cache-Control 为 no-cache，告诉浏览器不要缓存此响应。
header ( ' Cache-Control: no-cache ' );

// 这行代码设置 HTTP 响应的 Connection 为 keep-alive，保持长连接，以便服务器可以持续发送事件到客户端。
header ( ' Connection: keep-alive ' );

// 这行代码设置 HTTP 响应的自定义头部 X-Accel-Buffering 为 no，用于禁用某些代理或 Web 服务器（如 Nginx）的缓冲。
// 这有助于确保服务器发送事件在传输过程中不会受到缓冲影响。
header ( ' X-Accel-Buffering: no ' );

After that, every time we want to return data to the front-end, use the following code:

 echo ' data: ' . json_encode ([ ' time ' => date ( ' Y-m-d H:i:s ' ), ' content ' => '答： ' ]). PHP_EOL . PHP_EOL ;
flush ();

Here we define a data format we use, which only contains time and content. We understand without explanation. time is time, and content is the content we want to return to the front end.

Note that after all the answers are transmitted, we need to close the connection, and we can use the following code:

 echo ' retry: 86400000 ' . PHP_EOL ; // 告诉前端如果发生错误，隔多久之后才轮询一次
echo ' event: close ' . PHP_EOL ; // 告诉前端，结束了，该说再见了
echo ' data: Connection closed ' . PHP_EOL . PHP_EOL ; // 告诉前端，连接已关闭
flush ();

EventSource

Front-end js enables an EventSource request through const eventSource = new EventSource(url); ;.

After that, the server sends data to the front end in the format of data: {"kev1":"value1","kev2":"value2"} {"kev1":"value1","kev2":"value2"} The front end can obtain the json data in event.data in the message callback event of EventSource, and then obtain the js object through JSON.parse(event.data) .

The specific code is in the getAnswer function, as shown below:

 function getAnswer ( inputValue ) {
    inputValue = inputValue . replace ( '+' , '{[$add$]}' ) ;
    const url = "./chat.php?q=" + inputValue ;
    const eventSource = new EventSource ( url ) ;

    eventSource . addEventListener ( "open" , ( event ) => {
        console . log ( "连接已建立" , JSON . stringify ( event ) ) ;
    } ) ;

    eventSource . addEventListener ( "message" , ( event ) => {
        //console.log("接收数据：", event);
        try {
            var result = JSON . parse ( event . data ) ;
            if ( result . time && result . content ) {
                answerWords . push ( result . content ) ;
                contentIdx += 1 ;
            }
        } catch ( error ) {
            console . log ( error ) ;
        }
    } ) ;

    eventSource . addEventListener ( "error" , ( event ) => {
        console . error ( "发生错误：" , JSON . stringify ( event ) ) ;
    } ) ;

    eventSource . addEventListener ( "close" , ( event ) => {
        console . log ( "连接已关闭" , JSON . stringify ( event . data ) ) ;
        eventSource . close ( ) ;
        contentEnd = true ;
        console . log ( ( new Date ( ) . getTime ( ) ) , 'answer end' ) ;
    } ) ;
}

Let me explain that the native EventSource request can only be a GET request, so when you demonstrate here, you will directly put the question in URL parameter of GET . If you want to use POST requests, there are generally two ways:

Change the front and backends together: [Send POST first and then GET ] Use POST to ask the backend questions, and the backend generates a unique key based on the question and time. As POST request is returned to the frontend, after the frontend gets it, it initiates a GET request, carries the question key in the parameters and gets the answer. This method requires modifying the backend code;
Change the front-end only: [Only send one POST request] The back-end code does not need to be changed significantly. You only need to change $question = chat.php $question = urldecode($_GET['q'] ?? '') in chat.php $question = urldecode($_POST['q'] ?? '') , but the front-end needs to be modified and cannot use native EventSource requests. You need to use fetch to set streaming reception. For details, you can see the code example given in GPT4 below.

 async function fetchAiResponse ( message ) {
    try {
        const response = await fetch ( "./chat.php" , {
            method : "POST" ,
            headers : { "Content-Type" : "application/json" } ,
            body : JSON . stringify ( { messages : [ { role : "user" , content : message } ] } ) ,
        } ) ;

        if ( ! response . ok ) {
            throw new Error ( response . statusText ) ;
        }

        const reader = response . body . getReader ( ) ;
        const decoder = new TextDecoder ( "utf-8" ) ;

        while ( true ) {
            const { value , done } = await reader . read ( ) ;
            if ( value ) {
                const partialResponse = decoder . decode ( value , { stream : true } ) ;
                displayMessage ( "assistant" , partialResponse ) ;
            }
            if ( done ) {
                break ;
            }
        }
    } catch ( error ) {
        console . error ( "Error fetching AI response:" , error ) ;
        displayMessage ( "assistant" , "Error: Failed to fetch AI response." ) ;
    }
}

In the above code, the key point is { stream: true } in const partialResponse = decoder.decode(value, { stream: true }) .

Typewriter effect

For all the reply content returned by the backend, we need to print it out in a typewriter form.

The initial solution was to immediately display it on the page every time you received the backend return. Later, I found that this speed was too fast and it was displayed in the blink of an eye, without a printer effect. So the later solution was changed to using a timer to implement timed printing, so you need to put the received one into the array first to cache it, and then execute it regularly every 50 milliseconds to print out one content. The specific implementation code is as follows:

 function typingWords ( ) {
    if ( contentEnd && contentIdx == typingIdx ) {
        clearInterval ( typingTimer ) ;
        answerContent = '' ;
        answerWords = [ ] ;
        answers = [ ] ;
        qaIdx += 1 ;
        typingIdx = 0 ;
        contentIdx = 0 ;
        contentEnd = false ;
        lastWord = '' ;
        lastLastWord = '' ;
        input . disabled = false ;
        sendButton . disabled = false ;
        console . log ( ( new Date ( ) . getTime ( ) ) , 'typing end' ) ;
        return ;
    }
    if ( contentIdx <= typingIdx ) {
        return ;
    }
    if ( typing ) {
        return ;
    }
    typing = true ;

    if ( ! answers [ qaIdx ] ) {
        answers [ qaIdx ] = document . getElementById ( 'answer-' + qaIdx ) ;
    }

    const content = answerWords [ typingIdx ] ;
    if ( content . indexOf ( '`' ) != - 1 ) {
        if ( content . indexOf ( '```' ) != - 1 ) {
            codeStart = ! codeStart ;
        } else if ( content . indexOf ( '``' ) != - 1 && ( lastWord + content ) . indexOf ( '```' ) != - 1 ) {
            codeStart = ! codeStart ;
        } else if ( content . indexOf ( '`' ) != - 1 && ( lastLastWord + lastWord + content ) . indexOf ( '```' ) != - 1 ) {
            codeStart = ! codeStart ;
        }
    }

    lastLastWord = lastWord ;
    lastWord = content ;

    answerContent += content ;
    answers [ qaIdx ] . innerHTML = marked . parse ( answerContent + ( codeStart ? 'nn```' : '' ) ) ;

    typingIdx += 1 ;
    typing = false ;
}

Code rendering

If you print exactly what is output, then when you are printing a piece of code, you need to wait until all the code is finished before it can be formatted into a code block and the code can be highlighted. Then this experience is too bad. Is there any way to solve this problem? The answer is in the question. Since it is because the code block has a start mark but no end mark, then we just need to complete the end mark. Until the end mark really comes, there is no need to complete it.

The specific implementation is the following lines of code:

 if ( content . indexOf ( '`' ) != - 1 ) {
    if ( content . indexOf ( '```' ) != - 1 ) {
        codeStart = ! codeStart ;
    } else if ( content . indexOf ( '``' ) != - 1 && ( lastWord + content ) . indexOf ( '```' ) != - 1 ) {
        codeStart = ! codeStart ;
    } else if ( content . indexOf ( '`' ) != - 1 && ( lastLastWord + lastWord + content ) . indexOf ( '```' ) != - 1 ) {
        codeStart = ! codeStart ;
    }
}

lastLastWord = lastWord ;
lastWord = content ;

answerContent += content ;
answers [ qaIdx ] . innerHTML = marked . parse ( answerContent + ( codeStart ? 'nn```' : '' ) ) ;