python pinyin Download - python pinyin Source code download

python pinyin

Python

v0.53.0

Download

Chinese Pinyin Conversion Tool (Python Version)

GitHubAction

Convert Chinese characters to Pinyin. It can be used for Chinese character phonetic notation, sorting, and retrieval (Russian translation).

The initial version of the code refers to the implementation of hotoo/pinyin.

Documentation: https://pypinyin.readthedocs.io/
GitHub: https://github.com/mozillazg/python-pinyin
License: MIT license
PyPI: https://pypi.org/project/pypinyin
Python version: 2.7, pypy, pypy3, 3.4, 3.5, 3.6, 3.7, 3.8, 3.9, 3.10, 3.11, 3.12

Contents

characteristic
Install
Usage example
document
FAQ
- Wrong pinyin?
- Why are there no initial consonants y, w, and yu?
- Is there a pinyin that has neither initials nor finals?
- How to convert one style of pinyin to another style of pinyin?
- How to reduce memory usage?
Pinyin data
Related Projects

characteristic

Intelligently match the most correct pinyin based on the phrase.
Supports polyphonic characters.
Simple traditional Chinese support, phonetic notation support, and Waituma pinyin support.
Supports many different Pinyin/Zhuyin styles.

Install

pip install pypinyin

Usage example

 > >> from pypinyin import pinyin , lazy_pinyin , Style
> >> pinyin ( '中心' )  # or pinyin(['中心'])，参数值为列表时表示输入的是已分词后的数据
[[ 'zhōng' ], [ 'xīn' ]]
> >> pinyin ( '中心' , heteronym = True )  # 启用多音字模式
[[ 'zhōng' , 'zhòng' ], [ 'xīn' ]]
> >> pinyin ( '中心' , style = Style . FIRST_LETTER )  # 设置拼音风格
[[ 'z' ], [ 'x' ]]
> >> pinyin ( '中心' , style = Style . TONE2 , heteronym = True )
[[ 'zho1ng' , 'zho4ng' ], [ 'xi1n' ]]
> >> pinyin ( '中心' , style = Style . TONE3 , heteronym = True )
[[ 'zhong1' , 'zhong4' ], [ 'xin1' ]]
> >> pinyin ( '中心' , style = Style . BOPOMOFO )  # 注音风格
[[ 'ㄓㄨㄥ' ], [ 'ㄒㄧㄣ' ]]
> >> lazy_pinyin ( '威妥玛拼音' , style = Style . WADEGILES )
[ 'wei' , "t'o" , 'ma' , "p'in" , 'yin' ]
> >> lazy_pinyin ( '中心' )  # 不考虑多音字的情况
[ 'zhong' , 'xin' ]
> >> lazy_pinyin ( '战略' , v_to_u = True )  # 不使用 v 表示 ü
[ 'zhan' , 'lüe' ]
# 使用 5 标识轻声
> >> lazy_pinyin ( '衣裳' , style = Style . TONE3 , neutral_tone_with_five = True )
[ 'yi1' , 'shang5' ]
# 变调  nǐ hǎo -> ní hǎo
> >> lazy_pinyin ( '你好' , style = Style . TONE2 , tone_sandhi = True )
[ 'ni2' , 'ha3o' ]

Things to note :

By default, the pinyin result does not indicate which final is soft-tone, and soft-tone finals have no tone or number identification (can be turned on by parameter neutral_tone_with_five=True to use 5 to identify soft tone).
By default, the results in the non-tone-related Pinyin style will use v to represent ü (can be turned on by parameter v_to_u=True to use ü instead of v ).
By default, characters without pinyin will be output as they are (see the documentation for custom methods of processing characters without pinyin).
The pinyin of嗯is not en as most people think, and there is a pinyin that has neither initials nor finals. Please see the explanation in the FAQ below for details.

Command line tools:

$ pypinyin 音乐
yīn yuè

$ python -m pypinyin.tools.toneconvert to-tone ' zhong4 xin1 '
zhòng xīn

document

For detailed documentation, please visit: https://pypinyin.readthedocs.io/.

For questions about project code development, you can check out the development documentation.

FAQ

Wrong pinyin?

Pinyin accuracy can be improved by the following methods:

The pinyin results can be corrected by customizing the phrase pinyin library or the single-character pinyin library. See the documentation for details.

 >> from pypinyin import load_phrases_dict , load_single_dict

>> load_phrases_dict ({ '桔子' : [[ 'jú' ], [ 'zǐ' ]]})  # 增加 "桔子" 词组

>> load_single_dict ({ ord ( '还' ): 'hái,huán' })  # 调整 "还" 字的拼音顺序或覆盖默认拼音

You can also use the custom pinyin library provided by the pypinyin-dict project to correct the results.

 # 使用 phrase-pinyin-data 项目中 cc_cedict.txt 文件中的拼音数据优化结果
> >> from pypinyin_dict . phrase_pinyin_data import cc_cedict
> >> cc_cedict . load ()

# 使用 pinyin-data 项目中 kXHC1983.txt 文件中的拼音数据优化结果
> >> from pypinyin_dict . pinyin_data import kxhc1983
> >> kxhc1983 . load ()

If the pinyin is incorrect due to word segmentation, you can first use other word segmentation modules to segment the data, and then use the word segmentation result list as the parameter of the function:

 > >> # 使用其他分词模块分词，比如 jieba 之类，
>> > #或者基于 phrases_dict.py 里的词语数据使用其他分词算法分词
>> > words = list ( jieba . cut ( '每股24.67美元的确定性协议' ))
> >> pinyin ( words )

If you want to improve pinyin accuracy by training the model, you can take a look at the pypinyin-g2pW project.

Why are there no initial consonants y, w, and yu?

 > >> from pypinyin import Style , pinyin
> >> pinyin ( '下雨天' , style = Style . INITIALS )
[[ 'x' ], [ '' ], [ 't' ]]

Because according to the "Chinese Pinyin Plan", y, w, ü (yu) are not initial consonants.

In the initial consonant style (INITIALS), Chinese characters such as "Rain", "I", and "Yuan" return empty strings, because according to the "Chinese Pinyin Scheme", y, w, ü (yu) are not initial consonants, and in some specific finals When there is no initial consonant, y or w is added, and ü also has its own specific rules. ——@hotoo
If you think this brings you trouble, then please also be careful with some Chinese characters without initial consonants (such as "ah", "hungry", "press", "ang", etc.). At this time, what you may need is the first letter style (FIRST_LETTER) . ——@hotoo
Reference: hotoo/pinyin#57, #22, #27, #44

If you feel that this behavior is not what you want, and you just want to treat y as the initial consonant, you can specify strict=False , which may meet your expectations:

 > >> from pypinyin import Style , pinyin
> >> pinyin ( '下雨天' , style = Style . INITIALS )
[[ 'x' ], [ '' ], [ 't' ]]
> >> pinyin ( '下雨天' , style = Style . INITIALS , strict = False )
[[ 'x' ], [ 'y' ], [ 't' ]]

See the effects of the strict parameter for details.

Is there a pinyin that has neither initials nor finals?

Yes, in strict=True mode, there are very few pinyin that have neither initial consonants nor finals. For example, the following pinyin (from the Chinese characters嗯,呒,呣,唔):

 ń ńg ňg ǹg ň ǹ m̄ ḿ m̀

It is particularly important to note that all pinyin for嗯has neither initial consonants nor finals, and the default pinyin for呣has neither initial consonants nor finals. See #109 #259 #284 for details.

How to convert one style of pinyin to another style of pinyin?

You can use the auxiliary function provided by the pypinyin.contrib.tone_convert module to convert standard pinyin to obtain different styles of pinyin. For example, convert zhōng to zhong , or obtain initial consonant or final consonant data in Pinyin:

 > >> from pypinyin . contrib . tone_convert import to_normal , to_tone , to_initials , to_finals
> >> to_normal ( 'zhōng' )
'zhong'
> >> to_tone ( 'zhong1' )
'zhōng'
> >> to_initials ( 'zhōng' )
'zh'
> >> to_finals ( 'zhōng' )
'ong'

For more auxiliary functions for pinyin conversion, please see the documentation of the pypinyin.contrib.tone_convert module.

How to reduce memory usage?

If you don't particularly care about the accuracy of Pinyin, you can save memory by setting the environment variables PYPINYIN_NO_PHRASES and PYPINYIN_NO_DICT_COPY . See documentation for details

For more FAQ details, see the FAQ section of the documentation.