Alibaba Damo Academy Tongyi Laboratory has open sourced a speech processing technology called ClearerVoice-Studio, which aims to improve speech quality and intelligibility and solve speech intelligibility challenges caused by environmental noise, reverberation, and equipment pickup. . This technology integrates functions such as speech enhancement, speech separation, and audio and video speaker extraction, and adopts advanced complex domain deep learning algorithms to significantly improve the performance of speech noise reduction and separation, retain speech clarity to the greatest extent, and at the same time Voice distortion is minimized. Its core models include the FRCRN model that won the overall second place in the 2022 IEEE/INTER Speech DNS Challenge, and the MossFormer series of models that performed well in speech separation tasks, providing developers and researchers with powerful speech processing tools.
Alibaba Damo Academy’s Tongyi Laboratory recently announced that it will open source a speech processing technology called ClearerVoice-Studio, which aims to improve speech quality and intelligibility. With the widespread application of voice technology, voice quality has attracted more and more attention. Especially in the presence of environmental noise, reverberation and equipment pickup, the demand for voice processing technology has become increasingly urgent.
ClearerVoice-Studio integrates functions such as speech enhancement, speech separation, and audio and video speaker extraction. By integrating complex domain deep learning algorithms, it greatly improves the performance of speech noise reduction and separation. This technology eliminates background noise to the maximum extent, preserving speech intelligibility while keeping speech distortion to a minimum.
ClearerVoice-Studio's core models and algorithms include the FRCRN model that won the overall second place in the 2022 IEEE/INTER Speech DNS Challenge, and the MossFormer series of models that performed well in speech separation tasks. The 48kHz speech enhancement model based on MossFormer2 significantly reduces speech distortion while effectively suppressing noise.
Alibaba Tongyi Lab hopes to provide developers, researchers and enterprises with powerful voice processing tools through the ClearerVoice-Studio platform to help implement innovative applications. Users can experience the demo online, prepare a speech file containing noise, upload it to a designated page, process it with one click and listen online or download the processing results, and instantly obtain clear sound quality and excellent noise reduction effect.
GitHub repository: https://github.com/modelscope/ClearerVoice-Studio
Online experience Demo: https://huggingface.co/spaces/alibabasglab/ClearVoice
ClearerVoice-Studio provides a convenient online experience and GitHub warehouse to facilitate users to get started quickly. The open source of this technology will promote the progress and application of speech processing technology and bring innovation to more speech-related fields. We look forward to its wider application scenarios in the future.