Home assistants require special phrases called hotwords to get activated (e.g., "OK Google"). EfficientWord-Net is a hotword detection engine based on few-shot learning that allows developers to add custom hotwords to their programs without extra charges. The library is purely written in Python and uses Google's TFLite implementation for faster real-time inference. It is inspired by FaceNet's Siamese Network Architecture and performs best when 3-4 hotword samples are collected directly from the user.
Training File to access the training file.
Here are the links:
Research Paper to access the research paper.
This library works with Python versions 3.6 to 3.9.
Before running the pip installation command for the library, a few dependencies need to be installed manually:
Mac OS M* and Raspberry Pi users might have to compile these dependencies.
The tflite package cannot be listed in requirements.txt, hence it will be automatically installed when the package is initialized in the system.
The librosa package is not required for inference-only cases. However, when generate_reference
is called, it will be automatically installed.
Run the following pip command:
pip install EfficientWord-Net
To import the package:
import eff_word_net
After installing the packages, you can run the demo script built into the library (ensure you have a working microphone).
Access documentation from: https://ant-brain.github.io/EfficientWord-Net/
Command to run the demo:
python -m eff_word_net.engine
For any new hotword, the library needs information about the hotword. This information is obtained from a file called {wakeword}_ref.json
.
For example, for the wakeword 'alexa', the library would need the file called alexa_ref.json
.
These files can be generated with the following procedure:
Collect 4 to 10 uniquely sounding pronunciations of a given wakeword. Put them into a separate folder that doesn't contain anything else.
Alternatively, use the following command to generate audio files for a given word (uses IBM neural TTS demo API). Please don't overuse it for our sake:
python -m eff_word_net.ibm_generate
python -m eff_word_net.generate_reference
The pathname of the generated wakeword needs to be passed to the HotwordDetector instance:
HotwordDetector(
hotword="hello",
model=Resnet_50_Arc_loss(),
reference_file="/full/path/name/of/hello_ref.json",
threshold=0.9, # min confidence required to consider a trigger
relaxation_time=0.8 # default value, in seconds
)
The model variable can receive an instance of Resnet_50_Arc_loss or First_Iteration_Siamese.
The relaxation_time parameter is used to determine the minimum time between any two triggers. Any potential triggers before the relaxation_time will be canceled. The detector operates on a sliding window approach, resulting in multiple triggers for a single utterance of a hotword. The relaxation_time parameter can be used to control multiple triggers; in most cases, 0.8 seconds (default) will suffice.
The library has predefined embeddings readily available for a few wakewords such as Mycroft, Google, Firefox, Alexa, Mobile, and Siri. Their paths are readily available in the library installation directory.
from eff_word_net import samples_loc
import os
from eff_word_net.streams import SimpleMicStream
from eff_word_net.engine import HotwordDetector
from eff_word_net.audio_processing import Resnet50_Arc_loss
from eff_word_net import samples_loc
base_model = Resnet50_Arc_loss()
mycroft_hw = HotwordDetector(
hotword="mycroft",
model = base_model,
reference_file=os.path.join(samples_loc, "mycroft_ref.json"),
threshold=0.7,
relaxation_time=2
)
mic_stream = SimpleMicStream(
window_length_secs=1.5,
sliding_window_secs=0.75,
)
mic_stream.start_stream()
print("Say Mycroft ")
while True :
frame = mic_stream.getFrame()
result = mycroft_hw.scoreFrame(frame)
if result==None :
#no voice activity
continue
if(result["match"]):
print("Wakeword uttered",result["confidence"])
The library provides a computation friendly way
to detect multiple hotwords from a given stream, instead of running scoreFrame()
of each wakeword individually
import os
from eff_word_net.streams import SimpleMicStream
from eff_word_net import samples_loc
print(samples_loc)
base_model = Resnet50_Arc_loss()
mycroft_hw = HotwordDetector(
hotword="mycroft",
model = base_model,
reference_file=os.path.join(samples_loc,"mycroft_ref.json"),
threshold=0.7,
relaxation_time=2
)
alexa_hw = HotwordDetector(
hotword="alexa",
model=base_model,
reference_file=os.path.join(samples_loc,"alexa_ref.json"),
threshold=0.7,
relaxation_time=2,
#verbose=True
)
computer_hw = HotwordDetector(
hotword="computer",
model=base_model,
reference_file=os.path.join(samples_loc,"computer_ref.json"),
threshold=0.7,
relaxation_time=2,
#verbose=True
)
multi_hotword_detector = MultiHotwordDetector(
[mycroft_hw, alexa_hw, computer_hw],
model=base_model,
continuous=True,
)
mic_stream = SimpleMicStream(window_length_secs=1.5, sliding_window_secs=0.75)
mic_stream.start_stream()
print("Say ", " / ".join([x.hotword for x in multi_hotword_detector.detector_collection]))
while True :
frame = mic_stream.getFrame()
result = multi_hotword_detector.findBestMatch(frame)
if(None not in result):
print(result[0],f",Confidence {result[1]:0.4f}")
Access documentation of the library from here : https://ant-brain.github.io/EfficientWord-Net/
Here's the corrected version of the README.md file with improved grammar and formatting:
Our hotword detector's performance is notably lower compared to Porcupine. We have thought about better NN architectures for the engine and hope to outperform Porcupine. This has been our undergrad project, so your support and encouragement will motivate us to develop the engine further. If you love this project, recommend it to your peers, give us a ? on GitHub, and a clap ? on Medium.
Update: Your stars encouraged us to create a new model which is far better. Let's make this community grow!
Apache License 2.0