use whisper

usewisper

Speech Recorder、リアルタイムの転写、沈黙の除去を備えたOpenai Whisper APIの反応フック

デモ
リアルタイム転写デモ

use-whisper-real-time-transcription.mp4

発表
ReactネイティブのUseWhisperが開発されています。

リポジトリ：https：//github.com/chengsokdara/use-whisper-native

進捗状況：chengsokdara/use-whisper-native＃1

インストール

 npm i @chengsokdara/use-whisper

 yarn add @chengsokdara/use-whisper

使用法

 import { useWhisper } from '@chengsokdara/use-whisper'

const App = ( ) => {
  const {
    recording ,
    speaking ,
    transcribing ,
    transcript ,
    pauseRecording ,
    startRecording ,
    stopRecording ,
  } = useWhisper ( {
    apiKey : process . env . OPENAI_API_TOKEN , // YOUR_OPEN_AI_TOKEN
  } )

  return (
    < div >
      < p > Recording: { recording } < / p >
      < p > Speaking: { speaking } < / p >
      < p > Transcribing: { transcribing } < / p >
      < p > Transcribed Text: { transcript . text } < / p >
      < button onClick = { ( ) => startRecording ( ) } > Start < / button >
      < button onClick = { ( ) => pauseRecording ( ) } > Pause < / button >
      < button onClick = { ( ) => stopRecording ( ) } > Stop < / button >
    < / div >
  )
}

カスタムサーバー（Openai APIトークンをセキュアに保ちます）

 import { useWhisper } from '@chengsokdara/use-whisper'

const App = ( ) => {
  /**
   * you have more control like this
   * do whatever you want with the recorded speech
   * send it to your own custom server
   * and return the response back to useWhisper
   */
  const onTranscribe = ( blob : Blob ) => {
    const base64 = await new Promise < string | ArrayBuffer | null > (
      ( resolve ) => {
        const reader = new FileReader ( )
        reader . onloadend = ( ) => resolve ( reader . result )
        reader . readAsDataURL ( blob )
      }
    )
    const body = JSON . stringify ( { file : base64 , model : 'whisper-1' } )
    const headers = { 'Content-Type' : 'application/json' }
    const { default : axios } = await import ( 'axios' )
    const response = await axios . post ( '/api/whisper' , body , {
      headers ,
    } )
    const { text } = await response . data
    // you must return result from your server in Transcript format
    return {
      blob ,
      text ,
    }
  }

  const { transcript } = useWhisper ( {
    // callback to handle transcription with custom server
    onTranscribe ,
  } )

  return (
    < div >
      < p > { transcript . text } < / p >
    < / div >
  )
}

例
リアルタイムストリーミングトラスクリプション

 import { useWhisper } from '@chengsokdara/use-whisper'

const App = ( ) => {
  const { transcript } = useWhisper ( {
    apiKey : process . env . OPENAI_API_TOKEN , // YOUR_OPEN_AI_TOKEN
    streaming : true ,
    timeSlice : 1_000 , // 1 second
    whisperConfig : {
      language : 'en' ,
    } ,
  } )

  return (
    < div >
      < p > { transcript . text } < / p >
    < / div >
  )
}

コストを節約するためにささやきに送る前に沈黙を取り除きます

 import { useWhisper } from '@chengsokdara/use-whisper'

const App = ( ) => {
  const { transcript } = useWhisper ( {
    apiKey : process . env . OPENAI_API_TOKEN , // YOUR_OPEN_AI_TOKEN
    // use ffmpeg-wasp to remove silence from recorded speech
    removeSilence : true ,
  } )

  return (
    < div >
      < p > { transcript . text } < / p >
    < / div >
  )
}

マウントされたコンポーネントでの自動開始録音を開始します

 import { useWhisper } from '@chengsokdara/use-whisper'

const App = ( ) => {
  const { transcript } = useWhisper ( {
    apiKey : process . env . OPENAI_API_TOKEN , // YOUR_OPEN_AI_TOKEN
    // will auto start recording speech upon component mounted
    autoStart : true ,
  } )

  return (
    < div >
      < p > { transcript . text } < / p >
    < / div >
  )
}

ユーザーが話している限り、録音を続けてください

 import { useWhisper } from '@chengsokdara/use-whisper'

const App = ( ) => {
  const { transcript } = useWhisper ( {
    apiKey : process . env . OPENAI_API_TOKEN , // YOUR_OPEN_AI_TOKEN
    nonStop : true , // keep recording as long as the user is speaking
    stopTimeout : 5000 , // auto stop after 5 seconds
  } )

  return (
    < div >
      < p > { transcript . text } < / p >
    < / div >
  )
}

AutoTranscribeがtrueの場合、Whisper API構成をカスタマイズします

 import { useWhisper } from '@chengsokdara/use-whisper'

const App = ( ) => {
  const { transcript } = useWhisper ( {
    apiKey : process . env . OPENAI_API_TOKEN , // YOUR_OPEN_AI_TOKEN
    autoTranscribe : true ,
    whisperConfig : {
      prompt : 'previous conversation' , // you can pass previous conversation for context
      response_format : 'text' , // output text instead of json
      temperature : 0.8 , // random output
      language : 'es' , // Spanish
    } ,
  } )

  return (
    < div >
      < p > { transcript . text } < / p >
    < / div >
  )
}

依存関係
- @chengsokdara/React-hooks-async非同期反応フック
- RecordRTC：クロスブラウザーオーディオレコーダー
- LameJSは、クロスブラウザーサポートのためにWAVをMP3にエンコードします
- @ffmpeg/ffmpeg：沈黙の削除機能
- HARK：スピーキング検出用
- Axios： FetchはWhisper Endpointで動作しないためです

これらの依存関係のほとんどは怠zyなロードされているため、必要なときにのみインポートされます

API
構成オブジェクト

名前	タイプ	デフォルト値	説明
アピケイ	弦	''	Openai APIトークン
AutoStart	ブール	間違い	コンポーネントマウントでの自動開始音声録音
自動伝達	ブール	真実	録音を停止した後、自動転写する必要があります
モード	弦	転写	転写または翻訳のいずれかのささやきモードを制御します。現在、英語への翻訳のみをサポートしています
ノンストップ	ブール	間違い	Trueの場合、レコードはStopTimeOut後に自動停止します。ただし、ユーザーが話し続けると、レコーダーは録音を続けます
除去	ブール	間違い	Openai APIにファイルを送信する前に沈黙を削除します
StopTimeOut	番号	5,000ミリ秒	ノンストップが当てはまる場合、これは必要になります。レコーダーオートが停止したときのこの制御
ストリーミング	ブール	間違い	タイムスライスに基づいてリアルタイムでスピーチを転写します
タイムスライス	番号	1000ミリ秒	各オンダタアバイルイベント間の間隔
whisperconfig	whisperapiconfig	未定義	ささやきAPI転写構成
ondataavailable	（blob：blob）=> void	未定義	タイムスライス間の間隔で記録されたブロブを取得するためのコールバック関数
ontranscribe	（BLOB：BLOB）=> Promise <Transcript>	未定義	独自のカスタムサーバーで転写を処理するコールバック関数

whisperapiconfig

名前	タイプ	デフォルト値	説明
プロンプト	弦	未定義	モデルのスタイルを導くか、以前のオーディオセグメントを続行するオプションのテキスト。プロンプトはオーディオ言語と一致する必要があります。
Response_Format	弦	JSON	これらのオプションのいずれかにおけるトランスクリプト出力の形式：JSON、TEXT、SRT、verbose_json、またはVTT。
温度	番号	0	0〜1のサンプリング温度は0〜1。0.8のような値が高いほど、出力がよりランダムになりますが、0.2のような値が低くなると、より焦点が合って決定的になります。 0に設定すると、モデルはログ確率を使用して、特定のしきい値がヒットするまで温度を自動的に上昇させます。
言語	弦	en	入力オーディオの言語。 ISO-639-1形式で入力言語を提供すると、精度と遅延が向上します。

オブジェクトを返します

名前	タイプ	説明
録音	ブール	音声記録状態
話し中	ブール	ユーザーが話しているときに検出します
転写	ブール	スピーチから沈黙を削除し、Openai Whisper APIにリクエストを送信しながら
転写産物	転写産物	ささやく転写が完了した後にオブジェクトが返されます
ポーズレコーディング	約束	一時停止音声録音
スターコルディング	約束	音声録音を開始します
レコードを停止します	約束	ストップ音声録音