python silk codec binding supports WeChat voice codec
pilk: python + silk
Related projects: weixin-wxposed-silk-voice
pip install pilk
SILK is a speech coding format developed by Skype . The latest version available online was released in 2012.
SILK original code has been uploaded to Release, including specification documents
Tencent voice support comes from silk-v3-decoder
Release also contains the recompiled x64-win version of silk-v3-decoder, supports Chinese, source code
Tencent here refers to voice, only taking WeChat voice as an example
b'#!SILK_V3'
and ends with b'xFFxFF'
, with voice data in the middleb'x02'
at the beginning of the standard SILK file, removes b'xFFxFF'
at the end, and leaves the middle unchanged.are collectively referred to as voice files
The voice data is divided into many independent frames . The first two bytes of each frame store the size of the remaining frame data. Each frame stores 20ms of audio data by default.
Based on this, you can write a function to obtain the duration of the voice file (this function is included in pilk )
def get_duration ( silk_path : str , frame_ms : int = 20 ) -> int :
"""获取 silk 文件持续时间,单位:ms"""
with open ( silk_path , 'rb' ) as silk :
tencent = False
if silk . read ( 1 ) == b' x02 ' :
tencent = True
silk . seek ( 0 )
if tencent :
silk . seek ( 10 )
else :
silk . seek ( 9 )
i = 0
while True :
size = silk . read ( 2 )
if len ( size ) != 2 :
break
i += 1
size = size [ 0 ] + size [ 1 ] * 16
silk . seek ( silk . tell () + size )
return i * frame_ms
According to SILK format specification, frame_ms can be 20, 40, 60, 80, 100
Please check the API documentation comments in the IDE for details.
Before using pilk , you need to know that the conversion between audio files mp3, aac, m4a, flac, wav, ...
and voice files is completed with the help of PCM raw data
Specific conversion relationship: audio file ⇔ PCM ⇔ voice file
Audio (video) files ➜ PCM
With ffmpeg, of course you need to have ffmpeg first
ffmpeg -y -i <音(视)频输入文件> -vn -ar <采样率> -ac 1 -f s16le < PCM输出文件>
-y
: Can be added or not, indicating that <PCM output file> does not ask if it already exists, but overwrites it directly-i
: Nothing to say, fixed, followed by <audio (video) input file>-vn
: Indicates that video data will not be processed. It is recommended to add it. Although the video data will not be processed without adding it (there is no such thing as converting video data to PCM), a warning may be printed.-ar
: Set the sampling rate, the optional values are [8000, 12000, 16000, 24000, 32000, 44100, 48000], here you can directly understand it as the sound quality-ac
: Set the number of channels, which must be 1 here, which is determined by SILK-f
: means forced conversion to the specified format, generally it must be s16le , which means 16-bit short integer Little-Endian data
ffmpeg -y -i mv.mp4 -vn -ar 44100 -ac 1 -f s16le mv.pcm
ffmpeg -y -i music.mp3 -ar 44100 -ac 1 -f s16le music.pcm
PCM ➜ Audio files
ffmpeg -y -f s16le -ar <采样率> -ac <声道数> -i < PCM输入文件> <音频输出文件>
-f
: This must be s16le
, which is also determined by SILK-ar
: Same as above-ac
: The meaning is the same as above, the value is arbitrary<音频输出文件>
: The extension must be accurate. When the format is not specified, ffmpeg will determine the format that needs to be output based on the given output file extension.ffmpeg -y -f s16le -ar 16000 -i test.pcm test.mp3
ffmpeg can also be replaced by python ffmpeg binding. It is recommended that you study PyAV by yourself, so I won’t go into details here.
After talking about audio files ⇔ PCM, the next step is to use pilk to convert PCM ⇔ voice files.
import pilk
# pcm_rate 参数必须和 使用 ffmpeg 转 音频 到 PCM 文件时,使用的 `-ar` 参数一致
# pcm_rate 参数必须和 使用 ffmpeg 转 音频 到 PCM 文件时,使用的 `-ar` 参数一致
# pcm_rate 参数必须和 使用 ffmpeg 转 音频 到 PCM 文件时,使用的 `-ar` 参数一致
duration = pilk . encode ( "test.pcm" , "test.silk" , pcm_rate = 44100 , tencent = True )
print ( "语音时间为:" , duration )
import pilk
# pcm_rate 参数必须和 使用 ffmpeg 转 音频 到 PCM 文件时,使用的 `-ar` 参数一致
duration = pilk . decode ( "test.silk" , "test.pcm" )
print ( "语音时间为:" , duration )
Use pudub to depend on ffmpeg
import os , pilk
from pydub import AudioSegment
def convert_to_silk ( media_path : str ) -> str :
"""将输入的媒体文件转出为 silk, 并返回silk路径"""
media = AudioSegment . from_file ( media_path )
pcm_path = os . path . basename ( media_path )
pcm_path = os . path . splitext ( pcm_path )[ 0 ]
silk_path = pcm_path + '.silk'
pcm_path += '.pcm'
media . export ( pcm_path , 's16le' , parameters = [ '-ar' , str ( media . frame_rate ), '-ac' , '1' ]). close ()
pilk . encode ( pcm_path , silk_path , pcm_rate = media . frame_rate , tencent = True )
return silk_path
Recommended using pyav
import os
import av
import pilk
def to_pcm ( in_path : str ) -> tuple [ str , int ]:
"""任意媒体文件转 pcm"""
out_path = os . path . splitext ( in_path )[ 0 ] + '.pcm'
with av . open ( in_path ) as in_container :
in_stream = in_container . streams . audio [ 0 ]
sample_rate = in_stream . codec_context . sample_rate
with av . open ( out_path , 'w' , 's16le' ) as out_container :
out_stream = out_container . add_stream (
'pcm_s16le' ,
rate = sample_rate ,
layout = 'mono'
)
try :
for frame in in_container . decode ( in_stream ):
frame . pts = None
for packet in out_stream . encode ( frame ):
out_container . mux ( packet )
except :
pass
return out_path , sample_rate
def convert_to_silk ( media_path : str ) -> str :
"""任意媒体文件转 silk, 返回silk路径"""
pcm_path , sample_rate = to_pcm ( media_path )
silk_path = os . path . splitext ( pcm_path )[ 0 ] + '.silk'
pilk . encode ( pcm_path , silk_path , pcm_rate = sample_rate , tencent = True )
os . remove ( pcm_path )
return silk_path