pilk Download - pilk Source code download

pilk

AI Source Code

v0.2.4

Download

pilk

python silk codec binding supports WeChat voice codec

pilk: python + silk

Related projects: weixin-wxposed-silk-voice

Install

pip install pilk

Introduction and explanation

SILK is a speech coding format developed by Skype . The latest version available online was released in 2012.

SILK original code has been uploaded to Release, including specification documents

Tencent voice support comes from silk-v3-decoder

Release also contains the recompiled x64-win version of silk-v3-decoder, supports Chinese, source code

The relationship between SILK encoding format and Tencent system speech:

Tencent here refers to voice, only taking WeChat voice as an example

The standard SILK file starts with b'#!SILK_V3' and ends with b'xFFxFF' , with voice data in the middle
The WeChat voice file inserts b'x02' at the beginning of the standard SILK file, removes b'xFFxFF' at the end, and leaves the middle unchanged.

are collectively referred to as voice files

voice data

The voice data is divided into many independent frames . The first two bytes of each frame store the size of the remaining frame data. Each frame stores 20ms of audio data by default.

Based on this, you can write a function to obtain the duration of the voice file (this function is included in pilk )

 def get_duration ( silk_path : str , frame_ms : int = 20 ) -> int :
    """获取 silk 文件持续时间，单位：ms"""
    with open ( silk_path , 'rb' ) as silk :
        tencent = False
        if silk . read ( 1 ) == b' x02 ' :
            tencent = True
        silk . seek ( 0 )
        if tencent :
            silk . seek ( 10 )
        else :
            silk . seek ( 9 )
        i = 0
        while True :
            size = silk . read ( 2 )
            if len ( size ) != 2 :
                break
            i += 1
            size = size [ 0 ] + size [ 1 ] * 16
            silk . seek ( silk . tell () + size )
        return i * frame_ms

According to SILK format specification, frame_ms can be 20, 40, 60, 80, 100

Quick Start

Please check the API documentation comments in the IDE for details.

Before using pilk , you need to know that the conversion between audio files mp3, aac, m4a, flac, wav, ... and voice files is completed with the help of PCM raw data

Specific conversion relationship: audio file ⇔ PCM ⇔ voice file

Audio (video) files ➜ PCM
With ffmpeg, of course you need to have ffmpeg first
```
ffmpeg -y -i <音(视)频输入文件> -vn -ar <采样率> -ac 1 -f s16le < PCM输出文件>
```
1. -y : Can be added or not, indicating that <PCM output file> does not ask if it already exists, but overwrites it directly
2. -i : Nothing to say, fixed, followed by <audio (video) input file>
3. -vn : Indicates that video data will not be processed. It is recommended to add it. Although the video data will not be processed without adding it (there is no such thing as converting video data to PCM), a warning may be printed.
4. -ar : Set the sampling rate, the optional values are [8000, 12000, 16000, 24000, 32000, 44100, 48000], here you can directly understand it as the sound quality
5. -ac : Set the number of channels, which must be 1 here, which is determined by SILK
6. -f : means forced conversion to the specified format, generally it must be s16le , which means 16-bit short integer Little-Endian data
7. example1: ffmpeg -y -i mv.mp4 -vn -ar 44100 -ac 1 -f s16le mv.pcm
8. example2: ffmpeg -y -i music.mp3 -ar 44100 -ac 1 -f s16le music.pcm
PCM ➜ Audio files
```
ffmpeg -y -f s16le -ar <采样率> -ac <声道数> -i < PCM输入文件> <音频输出文件>
```
1. -f : This must be s16le , which is also determined by SILK
2. -ar : Same as above
3. -ac : The meaning is the same as above, the value is arbitrary
4. <音频输出文件> : The extension must be accurate. When the format is not specified, ffmpeg will determine the format that needs to be output based on the given output file extension.
5. example3: ffmpeg -y -f s16le -ar 16000 -i test.pcm test.mp3

ffmpeg can also be replaced by python ffmpeg binding. It is recommended that you study PyAV by yourself, so I won’t go into details here.

After talking about audio files ⇔ PCM, the next step is to use pilk to convert PCM ⇔ voice files.

silk coding

 import pilk

# pcm_rate 参数必须和 使用 ffmpeg 转 音频 到 PCM 文件时，使用的 `-ar` 参数一致
# pcm_rate 参数必须和 使用 ffmpeg 转 音频 到 PCM 文件时，使用的 `-ar` 参数一致
# pcm_rate 参数必须和 使用 ffmpeg 转 音频 到 PCM 文件时，使用的 `-ar` 参数一致
duration = pilk . encode ( "test.pcm" , "test.silk" , pcm_rate = 44100 , tencent = True )

print ( "语音时间为:" , duration )

silk decoding

 import pilk

# pcm_rate 参数必须和 使用 ffmpeg 转 音频 到 PCM 文件时，使用的 `-ar` 参数一致
duration = pilk . decode ( "test.silk" , "test.pcm" )

print ( "语音时间为:" , duration )

Convert any media files to SILK using Python

Use pudub to depend on ffmpeg

 import os , pilk
from pydub import AudioSegment


def convert_to_silk ( media_path : str ) -> str :
    """将输入的媒体文件转出为 silk, 并返回silk路径"""
    media = AudioSegment . from_file ( media_path )
    pcm_path = os . path . basename ( media_path )
    pcm_path = os . path . splitext ( pcm_path )[ 0 ]
    silk_path = pcm_path + '.silk'
    pcm_path += '.pcm'
    media . export ( pcm_path , 's16le' , parameters = [ '-ar' , str ( media . frame_rate ), '-ac' , '1' ]). close ()
    pilk . encode ( pcm_path , silk_path , pcm_rate = media . frame_rate , tencent = True )
    return silk_path

Recommended using pyav

 import os

import av

import pilk


def to_pcm ( in_path : str ) -> tuple [ str , int ]:
    """任意媒体文件转 pcm"""
    out_path = os . path . splitext ( in_path )[ 0 ] + '.pcm'
    with av . open ( in_path ) as in_container :
        in_stream = in_container . streams . audio [ 0 ]
        sample_rate = in_stream . codec_context . sample_rate
        with av . open ( out_path , 'w' , 's16le' ) as out_container :
            out_stream = out_container . add_stream (
                'pcm_s16le' ,
                rate = sample_rate ,
                layout = 'mono'
            )
            try :
               for frame in in_container . decode ( in_stream ):
                  frame . pts = None
                  for packet in out_stream . encode ( frame ):
                     out_container . mux ( packet )
            except :
               pass
    return out_path , sample_rate


def convert_to_silk ( media_path : str ) -> str :
    """任意媒体文件转 silk, 返回silk路径"""
    pcm_path , sample_rate = to_pcm ( media_path )
    silk_path = os . path . splitext ( pcm_path )[ 0 ] + '.silk'
    pilk . encode ( pcm_path , silk_path , pcm_rate = sample_rate , tencent = True )
    os . remove ( pcm_path )
    return silk_path

Expand

Additional Information

Version v0.2.4
Type AI Source Code
Update Time 2025-01-17
size 436.12KB
From Github

Related Applications

node telegram bot api

2024-12-14
typebot.io

2024-12-14
python wechaty getting started

2024-12-14
TranscriberBot

2024-12-14
genal chat

2024-12-14
Facemoji

2024-12-14

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
node telegram bot api

AI Source Code

v0.50.0
typebot.io

AI Source Code

v3.1.2
python wechaty getting started

AI Source Code

1.0.0
waymo open dataset

Other source code

December 2023 Update
termwind

Other categories

v2.3.0
wp functions

Other categories

1.0.0

Related Information All