EfficientWord Net下載 - EfficientWord Net源碼下載

EfficientWord Net

Ai源碼

v.0.2.2

下載

EfficientWord-Net：基於少樣本學習的熱詞檢測

家庭助理需要稱為熱詞的特殊短語才能啟動（例如“OK Google”）。 EfficientWord-Net 是一種基於小樣本學習的熱詞檢測引擎，允許開發人員將自訂熱詞添加到他們的程式中，而無需額外付費。該函式庫純粹以 Python 編寫，並使用 Google 的 TFLite 實作來實現更快的即時推理。它受到 FaceNet 的 Siamese 網路架構的啟發，在直接從用戶收集 3-4 個熱詞樣本時表現最佳。

EfficientWord-Net 在 Pi 上的示範

EfficientWord-Net.mp4

訪問培訓文件

培訓文件存取培訓文件。

數據集

以下是連結：

數據集1
數據集2

存取檔案

研究論文訪問研究論文。

Python 版本要求

本函式庫適用於 Python 版本 3.6 至 3.9。

依賴安裝

在執行庫的 pip 安裝命令之前，需要手動安裝一些依賴項：

PyAudio（取決於 PortAudio）
TFLite（TensorFlow 輕量級二進位）
Librosa（二進位檔案可能不適用於某些系統）

Mac OS M* 和 Raspberry Pi 使用者可能必須編譯這些依賴項。

tflite套件無法在requirements.txt 中列出，因此當套件在系統中初始化時會自動安裝。

僅推理情況不需要librosa套件。但是，當呼叫generate_reference時，它會自動安裝。

套件安裝

執行以下 pip 命令：

 pip install EfficientWord-Net

導入包：

 import eff_word_net

示範

安裝軟體包後，您可以運行庫中內建的演示腳本（確保您有一個可用的麥克風）。

存取文件：https://ant-brain.github.io/EfficientWord-Net/

運行演示的命令：

 python -m eff_word_net.engine

產生自訂喚醒詞

對於任何新的熱詞，圖書館需要有關該熱詞的資訊。此資訊是從名為{wakeword}_ref.json的檔案中取得的。例如，對於喚醒詞“alexa”，庫需要名為alexa_ref.json的檔案。

這些文件可以透過以下過程產生：

收集給定喚醒詞的 4 到 10 個發音獨特的發音。將它們放入不包含其他任何內容的單獨資料夾中。
或者，使用以下命令為給定單字產生音訊檔案（使用 IBM 神經 TTS 演示 API）。為了我們的利益，請不要過度使用它：

python -m eff_word_net.ibm_generate

最後，運行此命令。它將詢問輸入資料夾的位置（包含音訊檔案）和輸出資料夾（將儲存 _ref.json 檔案的位置）：

 python -m eff_word_net.generate_reference

產生的喚醒詞的路徑名需要傳遞給 HotwordDetector 實例：

 HotwordDetector (
    hotword = "hello" ,
    model = Resnet_50_Arc_loss (),
    reference_file = "/full/path/name/of/hello_ref.json" ,
    threshold = 0.9 ,  # min confidence required to consider a trigger
    relaxation_time = 0.8  # default value, in seconds
)

模型變數可以接收 Resnet_50_Arc_loss 或 First_Iteration_Siamese 的實例。

relaxation_time 參數用於確定任兩個觸發器之間的最短時間。 Relax_time 之前的任何潛在觸發器都將被取消。此檢測器採用滑動視窗方法運行，從而導致單一熱詞的多次觸發。 relaxation_time參數可用於控制多個觸發器；在大多數情況下，0.8 秒（預設）就足夠了。

開箱即用的範例熱詞

該程式庫已經為一些喚醒詞（例如Mycroft 、 Google 、 Firefox 、 Alexa 、 Mobile和Siri）提供了預先定義的嵌入。它們的路徑在庫安裝目錄中很容易取得。

 from eff_word_net import samples_loc

嘗試您的第一個單一熱詞偵測腳本

 import os
from eff_word_net . streams import SimpleMicStream
from eff_word_net . engine import HotwordDetector

from eff_word_net . audio_processing import Resnet50_Arc_loss

from eff_word_net import samples_loc

base_model = Resnet50_Arc_loss ()

mycroft_hw = HotwordDetector (
    hotword = "mycroft" ,
    model = base_model ,
    reference_file = os . path . join ( samples_loc , "mycroft_ref.json" ),
    threshold = 0.7 ,
    relaxation_time = 2
)

mic_stream = SimpleMicStream (
    window_length_secs = 1.5 ,
    sliding_window_secs = 0.75 ,
)

mic_stream . start_stream ()

print ( "Say Mycroft " )
while True :
    frame = mic_stream . getFrame ()
    result = mycroft_hw . scoreFrame ( frame )
    if result == None :
        #no voice activity
        continue
    if ( result [ "match" ]):
        print ( "Wakeword uttered" , result [ "confidence" ])

從音頻流中偵測多個熱詞

該函式庫提供了一種計算友善的方法來檢測給定流中的多個熱詞，而不是單獨運行每個喚醒詞的scoreFrame()

 import os
from eff_word_net . streams import SimpleMicStream
from eff_word_net import samples_loc
print ( samples_loc )


base_model = Resnet50_Arc_loss ()

mycroft_hw = HotwordDetector (
    hotword = "mycroft" ,
    model = base_model ,
    reference_file = os . path . join ( samples_loc , "mycroft_ref.json" ),
    threshold = 0.7 ,
    relaxation_time = 2
)

alexa_hw = HotwordDetector (
        hotword = "alexa" ,
        model = base_model ,
        reference_file = os . path . join ( samples_loc , "alexa_ref.json" ),
        threshold = 0.7 ,
        relaxation_time = 2 ,
        #verbose=True
)


computer_hw = HotwordDetector (
    hotword = "computer" ,
    model = base_model ,
    reference_file = os . path . join ( samples_loc , "computer_ref.json" ),
    threshold = 0.7 ,
    relaxation_time = 2 ,
    #verbose=True
)

multi_hotword_detector = MultiHotwordDetector (
    [ mycroft_hw , alexa_hw , computer_hw ],
    model = base_model ,
    continuous = True ,
)

mic_stream = SimpleMicStream ( window_length_secs = 1.5 , sliding_window_secs = 0.75 )
mic_stream . start_stream ()

print ( "Say " , " / " . join ([ x . hotword for x in multi_hotword_detector . detector_collection ]))

while True :
    frame = mic_stream . getFrame ()
    result = multi_hotword_detector . findBestMatch ( frame )
    if ( None not in result ):
        print ( result [ 0 ], f",Confidence { result [ 1 ]:0.4f } " )

從這裡存取該程式庫的文檔：https://ant-brain.github.io/EfficientWord-Net/

以下是 README.md 檔案的更正版本，改進了語法和格式：

將註釋從 0.2.2 更改為 v1.0.1

新模型新增：Resnet_50_Arc_loss 具有巨大改進！

使用 MLCommons 的修改後的蒸餾資料集從頭開始訓練新模型。
使用弧損失函數取代三元組損失函數。
產生的模型儲存為 resnet_50_arcloss。
較新的模型對背景雜訊表現出更好的彈性，並且需要更少的樣本才能獲得良好的準確性。
API 流程進行了細微更改，以便於輕鬆新增模型。
較新的模型可以處理 1.5 秒的固定視窗長度。
舊模型仍然可以透過first_iteration_siamese存取。

將註釋從 v0.1.1 更改為 0.2.2

主要變化是用更簡單的邏輯和更簡單的 API 來取代處理每個話語的多重觸發器的複雜邏輯。
引入重大變更。
目前模型的 C++ 實作在這裡。

目前模型的局限性

基於單字進行訓練，因此在使用“Hey xxx”等短語時可能會導致奇怪的行為。
音訊處理視窗限制為 1 秒。因此，它對於較長的熱詞不起作用。

常問問題

熱詞性能很差：如果您遇到這樣的問題，請隨時在討論中提問。
它可以在 Arduino 等 FPGA 上運行嗎？ ：不，新的 Resnet_50_Arcloss 模型太重，無法在 Arduino 上運行（大小約為 88MB）。我們很快就會添加對該模型的精簡版本的支持，以便它變得足夠輕，可以在小型設備上運行。目前，它應該能夠在類似 Raspberry Pi 的裝置上運作。