better_profanity下載 - better_profanity原始碼下載

better_profanity

其他源碼

v0.7.0

下載

更好的髒話

極其快速地清理字串中的髒話（及其利茲語）

目前最新版本（0.7.0）有效能問題。建議使用最新穩定版本0.6.1。

受到 Ben Friedland 的軟體包髒話的啟發，該庫通過使用字串比較而不是正則表達式，比原始庫要快得多。

它支援修改的拼字（例如p0rn 、 h4NDjob 、 handj0b和b*tCh ）。

要求

此軟體包適用於Python 3.5+和PyPy3 。

安裝

pip3 install better_profanity

統一碼字符

僅新增類別Ll 、 Lu 、 Mc和Mn中的 Unicode 字元。有關 Unicode 類別的更多資訊可以在此處找到。

尚不支援所有語言，例如中文。

用法

 from better_profanity import profanity

if __name__ == "__main__" :
    profanity . load_censor_words ()

    text = "You p1ec3 of sHit."
    censored_text = profanity . censor ( text )
    print ( censored_text )
    # You **** of ****.

將產生 profanity_wordlist.txt 中單字的所有修改拼字。例如，單字handjob將被載入到：

 'handjob' , 'handj*b' , 'handj0b' , 'handj@b' , 'h@ndjob' , 'h@ndj*b' , 'h@ndj0b' , 'h@ndj@b' ,
'h*ndjob' , 'h*ndj*b' , 'h*ndj0b' , 'h*ndj@b' , 'h4ndjob' , 'h4ndj*b' , 'h4ndj0b' , 'h4ndj@b'

該庫的完整映射可以在 profanity.py 中找到。

1. 審查文本中的髒話

預設情況下， profanity會用 4 個星號****取代每個髒話。

 from better_profanity import profanity

if __name__ == "__main__" :
    text = "You p1ec3 of sHit."

    censored_text = profanity . censor ( text )
    print ( censored_text )
    # You **** of ****.

2.審查員不關心分詞器

函數.censor()還隱藏不只是由空格分隔的單字還有其他分隔符，例如_ ,和. 。 @, $, *, ", '除外。

 from better_profanity import profanity

if __name__ == "__main__" :
    text = "...sh1t...hello_cat_fuck,,,,123"

    censored_text = profanity . censor ( text )
    print ( censored_text )
    # "...****...hello_cat_****,,,,123"

3.用自訂字元審查髒話

.censor()中第二個參數中的字元的 4 個實例將用於取代髒話。

 from better_profanity import profanity

if __name__ == "__main__" :
    text = "You p1ec3 of sHit."

    censored_text = profanity . censor ( text , '-' )
    print ( censored_text )
    # You ---- of ----.

4. 檢查字串中是否包含髒話

如果給定字串中的任何單字在單字清單中存在，則函數.contains_profanity()傳回True 。

 from better_profanity import profanity

if __name__ == "__main__" :
    dirty_text = "That l3sbi4n did a very good H4ndjob."

    profanity . contains_profanity ( dirty_text )
    # True

5. 使用自訂詞彙表審查髒話

5.1.作為`List`單字列表

函數load_censor_words將字串List作為審查詞。提供的清單將替換預設的單字清單。

 from better_profanity import profanity

if __name__ == "__main__" :
    custom_badwords = [ 'happy' , 'jolly' , 'merry' ]
    profanity . load_censor_words ( custom_badwords )

    print ( profanity . contains_profanity ( "Have a merry day! :)" ))
    # Have a **** day! :)

5.2.作為文件的單字列表

函數「load_censor_words_from_file」接受一個檔案名，該檔案是一個文字文件，每個單字由行分隔。

 from better_profanity import profanity

if __name__ == "__main__" :
    profanity . load_censor_words_from_file ( '/path/to/my/project/my_wordlist.txt' )

6.白名單

函數load_censor_words和load_censor_words_from_file採用關鍵字參數whitelist_words來忽略單字清單中的單字。

當您想在單字清單中忽略幾個單字時，最好使用它。

 # Use the default wordlist
profanity . load_censor_words ( whitelist_words = [ 'happy' , 'merry' ])

# or with your custom words as a List
custom_badwords = [ 'happy' , 'jolly' , 'merry' ]
profanity . load_censor_words ( custom_badwords , whitelist_words = [ 'merry' ])

# or with your custom words as a text file
profanity . load_censor_words_from_file ( '/path/to/my/project/my_wordlist.txt' , whitelist_words = [ 'merry' ])

7.增加更多審查詞

 from better_profanity import profanity

if __name__ == "__main__" :
    custom_badwords = [ 'happy' , 'jolly' , 'merry' ]
    profanity . add_censor_words ( custom_badwords )

    print ( profanity . contains_profanity ( "Happy you, fuck!" ))
    # **** you, ****!

限制

當庫按字元比較每個單字時，透過向單字添加任何字元可以輕鬆繞過審查：

 profanity . censor ( 'I just have sexx' )
# returns 'I just have sexx'

profanity . censor ( 'jerkk off' )
# returns 'jerkk off'

單字清單中任何包含非空格分隔符號的單字都無法被識別，例如s & m ，因此不會被過濾掉。這個問題在#5 中提出。

測試

python3 tests.py

貢獻

請閱讀 CONTRIBUTING.md 以了解有關我們的行為準則以及向我們提交拉取請求的流程的詳細資訊。

執照

該項目已獲得 MIT 許可證 - 有關詳細信息，請參閱 LICENSE.md 文件

特別感謝

Andrew Grinevich - 新增對 Unicode 字元的支援。
Jaclyn Brockschmidt - 最佳化字串比較。

致謝

Ben Friedland - 鼓舞人心的包裝髒話。

展開

附加信息

版本 v0.7.0
類型其他源碼
更新時間 2024-12-22
大小 295.24KB
來自於 Github

相關應用

OpenCore_NO_ACPI_Build

2024-11-13
nspanel_pro_tools_apk

2024-11-12
zkwork_aleo_gpu_worker

2024-11-11
zotero better notes

2024-11-06
nextcloud_share_url_downloader

2024-11-01
麗華資料分析引擎免費版3.0_搜尋_導航_採集_輿情_排行_api

2022-06-28

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
waymo open dataset

其他源碼

December 2023 Update
SmartTube

其他源碼

24.71 Stable
Sunamu

其他源碼

Release 2.2.0
waymo open dataset

其他源碼

December 2023 Update
wp functions

其他類別

1.0.0
termwind

其他類別

v2.3.0

相關資訊全部

better_profanity