better_profanity下载 - better_profanity源代码下载

better_profanity

其他源码

v0.7.0

下载

更好的脏话

极其快速地清理字符串中的脏话（及其利兹语）

目前最新版本（0.7.0）存在性能问题。建议使用最新稳定版本0.6.1。

受到 Ben Friedland 的软件包脏话的启发，该库通过使用字符串比较而不是正则表达式，比原始库要快得多。

它支持修改的拼写（例如p0rn 、 h4NDjob 、 handj0b和b*tCh ）。

要求

该软件包适用于Python 3.5+和PyPy3 。

安装

pip3 install better_profanity

统一码字符

仅添加类别Ll 、 Lu 、 Mc和Mn中的 Unicode 字符。有关 Unicode 类别的更多信息可以在此处找到。

尚不支持所有语言，例如中文。

用法

 from better_profanity import profanity

if __name__ == "__main__" :
    profanity . load_censor_words ()

    text = "You p1ec3 of sHit."
    censored_text = profanity . censor ( text )
    print ( censored_text )
    # You **** of ****.

将生成 profanity_wordlist.txt 中单词的所有修改拼写。例如，单词handjob将被加载到：

 'handjob' , 'handj*b' , 'handj0b' , 'handj@b' , 'h@ndjob' , 'h@ndj*b' , 'h@ndj0b' , 'h@ndj@b' ,
'h*ndjob' , 'h*ndj*b' , 'h*ndj0b' , 'h*ndj@b' , 'h4ndjob' , 'h4ndj*b' , 'h4ndj0b' , 'h4ndj@b'

该库的完整映射可以在 profanity.py 中找到。

1. 审查文本中的脏话

默认情况下， profanity会用 4 个星号****替换每个脏话。

 from better_profanity import profanity

if __name__ == "__main__" :
    text = "You p1ec3 of sHit."

    censored_text = profanity . censor ( text )
    print ( censored_text )
    # You **** of ****.

2.审查员不关心分词器

函数.censor()还隐藏不只是由空格分隔的单词还有其他分隔符，例如_ ,和. 。 @, $, *, ", '除外。

 from better_profanity import profanity

if __name__ == "__main__" :
    text = "...sh1t...hello_cat_fuck,,,,123"

    censored_text = profanity . censor ( text )
    print ( censored_text )
    # "...****...hello_cat_****,,,,123"

3.用自定义字符审查脏话

.censor()中第二个参数中的字符的 4 个实例将用于替换脏话。

 from better_profanity import profanity

if __name__ == "__main__" :
    text = "You p1ec3 of sHit."

    censored_text = profanity . censor ( text , '-' )
    print ( censored_text )
    # You ---- of ----.

4. 检查字符串中是否包含脏话

如果给定字符串中的任何单词在单词列表中存在，则函数.contains_profanity()返回True 。

 from better_profanity import profanity

if __name__ == "__main__" :
    dirty_text = "That l3sbi4n did a very good H4ndjob."

    profanity . contains_profanity ( dirty_text )
    # True

5. 使用自定义词汇表审查脏话

5.1.作为`List`单词列表

函数load_censor_words将字符串List作为审查词。提供的列表将替换默认的单词列表。

 from better_profanity import profanity

if __name__ == "__main__" :
    custom_badwords = [ 'happy' , 'jolly' , 'merry' ]
    profanity . load_censor_words ( custom_badwords )

    print ( profanity . contains_profanity ( "Have a merry day! :)" ))
    # Have a **** day! :)

5.2.作为文件的单词列表

函数“load_censor_words_from_file”接受一个文件名，该文件是一个文本文件，每个单词由行分隔。

 from better_profanity import profanity

if __name__ == "__main__" :
    profanity . load_censor_words_from_file ( '/path/to/my/project/my_wordlist.txt' )

6.白名单

函数load_censor_words和load_censor_words_from_file采用关键字参数whitelist_words来忽略单词列表中的单词。

当您想在单词列表中忽略几个单词时，最好使用它。

 # Use the default wordlist
profanity . load_censor_words ( whitelist_words = [ 'happy' , 'merry' ])

# or with your custom words as a List
custom_badwords = [ 'happy' , 'jolly' , 'merry' ]
profanity . load_censor_words ( custom_badwords , whitelist_words = [ 'merry' ])

# or with your custom words as a text file
profanity . load_censor_words_from_file ( '/path/to/my/project/my_wordlist.txt' , whitelist_words = [ 'merry' ])

7.添加更多审查词

 from better_profanity import profanity

if __name__ == "__main__" :
    custom_badwords = [ 'happy' , 'jolly' , 'merry' ]
    profanity . add_censor_words ( custom_badwords )

    print ( profanity . contains_profanity ( "Happy you, fuck!" ))
    # **** you, ****!

局限性

当库按字符比较每个单词时，通过向单词添加任何字符可以轻松绕过审查：

 profanity . censor ( 'I just have sexx' )
# returns 'I just have sexx'

profanity . censor ( 'jerkk off' )
# returns 'jerkk off'

单词列表中任何包含非空格分隔符的单词都无法被识别，例如s & m ，因此不会被过滤掉。这个问题在#5 中提出。

测试

python3 tests.py

贡献

请阅读 CONTRIBUTING.md 了解有关我们的行为准则以及向我们提交拉取请求的流程的详细信息。

执照

该项目已获得 MIT 许可证 - 有关详细信息，请参阅 LICENSE.md 文件

特别感谢

Andrew Grinevich - 添加对 Unicode 字符的支持。
Jaclyn Brockschmidt - 优化字符串比较。

致谢

Ben Friedland - 鼓舞人心的包装脏话。

展开

附加信息

版本 v0.7.0
类型其他源码
更新时间 2024-12-22
大小 295.24KB
来自于 Github

better_profanity

更好的脏话

要求

安装

统一码字符

用法

1. 审查文本中的脏话

2.审查员不关心分词器

3.用自定义字符审查脏话

4. 检查字符串中是否包含脏话

5. 使用自定义词汇表审查脏话

5.1.作为`List`单词列表

5.2.作为文件的单词列表

6.白名单

7.添加更多审查词

局限性

测试

贡献

执照

特别感谢

致谢

OpenCore_NO_ACPI_Build

nspanel_pro_tools_apk

zkwork_aleo_gpu_worker

zotero better notes

nextcloud_share_url_downloader

丽华数据分析引擎免费版3.0_搜索_导航_采集_舆情_排行_api

chat.petals.dev

GPT Prompt Templates

GPTyped

waymo open dataset

SmartTube

Sunamu

waymo open dataset

wp functions

termwind

better_profanity

更好的脏话

要求

安装

统一码字符

用法

1. 审查文本中的脏话

2.审查员不关心分词器

3.用自定义字符审查脏话

4. 检查字符串中是否包含脏话

5. 使用自定义词汇表审查脏话

5.1.作为List单词列表

5.2.作为文件的单词列表

6.白名单

7.添加更多审查词

局限性

测试

贡献

执照

特别感谢

致谢

5.1.作为`List`单词列表