surprisal下載 - surprisal原始碼下載

surprisal

其他源碼

0.1.6

下載

令人驚訝的

從語言模型中計算出驚喜！

surprisal支援 Huggingface 或本地檢查點的大多數因果語言模型（類似GPT2和GPTneo的模型），以及使用其 API 的 OpenAI 的GPT3模型！我們也使用 KenLM Python 介面支援基於KenLM N-gram 的語言模型。

屏蔽語言模型（類似BERT的模型）正在開發中，並將在未來得到支援（參見 #9）。

文件

用法

下面的程式碼片段計算句子列表的每個標記的驚喜

 from surprisal import AutoHuggingFaceModel , KenLMModel

sentences = [
    "The cat is on the mat" ,
    "The cat is on the hat" ,
    "The cat is on the pizza" ,
    "The pizza is on the mat" ,
    "I told you that the cat is on the mat" ,
    "I told you the cat is on the mat" ,
]

m = AutoHuggingFaceModel . from_pretrained ( 'gpt2' )
m . to ( 'cuda' ) # optionally move your model to GPU!

k = KenLMModel ( model_path = './literature.arpa' )

for result in m . surprise ( sentences ):
    print ( result )
for result in k . surprise ( sentences ):
    print ( result )

並產生這種類型的輸出（ gpt2 ）：

       The       Ġcat        Ġis        Ġon       Ġthe       Ġmat  
     3.276      9.222      2.463      4.145      0.961      7.237  
       The       Ġcat        Ġis        Ġon       Ġthe       Ġhat  
     3.276      9.222      2.463      4.145      0.961      9.955  
       The       Ġcat        Ġis        Ġon       Ġthe     Ġpizza  
     3.276      9.222      2.463      4.145      0.961      8.212  
       The     Ġpizza        Ġis        Ġon       Ġthe       Ġmat  
     3.276     10.860      3.212      4.910      0.985      8.379  
         I      Ġtold       Ġyou      Ġthat       Ġthe       Ġcat        Ġis        Ġon       Ġthe       Ġmat 
     3.998      6.856      0.619      2.443      2.711      7.955      2.596      4.804      1.139      6.946 
         I      Ġtold       Ġyou       Ġthe       Ġcat        Ġis        Ġon       Ġthe       Ġmat  
     3.998      6.856      0.619      4.115      7.612      3.031      4.817      1.233      7.033

提取子字串中的驚喜

令人驚訝的物件可以聚合在最匹配單字或字元範圍的標記子集上。單字邊界繼承自模型的標準分詞器，並且在模型之間可能不一致，因此在切片時使用字元跨度是預設和建議的選項。驚喜位於日誌空間中，因此在聚合期間添加到令牌上。例如：

 >> > [ s ] = m . surprise ( "The cat is on the mat" )
>> > s [ 3 : 6 , "word" ] 
12.343366384506226
Ġon Ġthe Ġmat
>> > s [ 3 : 6 , "char" ]
9.222099304199219
Ġcat
>> > s [ 3 : 6 ]
9.222099304199219
Ġcat

使用 OpenAI API 的 GPT-3

注意：OpenAI 最近不再傳回大多數模型中的對數機率。參見#15。為了使用 OpenAI API 中的 GPT-3 模型，您需要使用您的帳戶來取得您的組織 ID 和使用者特定的 API 金鑰。然後，以與 Huggingface 模型相同的方式使用OpenAIModel 。

 m = surprisal . OpenAIModel ( model_id = 'text-davinci-002' ,
                          openai_api_key = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" , 
                          openai_org = "org-xxxxxxxxxxxxxxxxxxxxxxxx" )

這些值也可以在呼叫腳本之前使用環境變數OPENAI_API_KEY和OPENAI_ORG傳遞。

您也可以呼叫Surprisal.lineplot()來視覺化驚喜：

 from matplotlib import pyplot as plt
f , a = None , None
for result in m . surprise ( sentences ):
    f , a = result . lineplot ( f , a )

plt . show ()

surprisal還有一個最小的 CLI：

 python - m surprisal - m distilgpt2 "I went to the train station today."
      I      Ġwent        Ġto       Ġthe     Ġtrain   Ġstation     Ġtoday          . 
  4.984      5.729      0.812      1.723      7.317      0.497      4.600      2.528 

python - m surprisal - m distilgpt2 "I went to the space station today."
      I      Ġwent        Ġto       Ġthe     Ġspace   Ġstation     Ġtoday          . 
  4.984      5.729      0.812      1.723      8.425      0.707      5.182      2.574

安裝中

由於來自不同社區的人們出於不同的目的使用surprisal ，因此預設情況下，與語言建模相關的核心依賴項被標記為可選。根據您的使用案例，安裝帶有適當附加功能的surprisal 。

從 PyPI 安裝（最新穩定版本）

使用pip install surprisal[optional]之類的命令，將[optional]替換為您需要的任何可選支援。對於多個可選附加項，請使用逗號分隔的清單：

pip install surprisal[kenlm,transformers]
# the above is equivalent to
pip install surprisal[all]

可能的選項包括： transformers 、 kenlm 、 openai 、 petals

如果您在現有專案中使用poetry ，請使用-E選項添加surprisal以及所需的可選依賴項：

poetry add surprisal -E transformers -E kenlm
# the above is equivalent to
poetry add surprisal -E all

要同時安裝openai和petals ，你可以這樣做

poetry add surprisal -E transformers -E kenlm -E openai -E petals
# the above is equivalent to 
poetry add surprisal -E allplus

從 GitHub 安裝（前沿）

-e標誌允許可編輯安裝，因此您可以對surprisal進行更改。

git clone https://github.com/aalok-sathe/surprisal.git
pip install .[transformers] -e

致謝

靈感來自現已停用的lm-scorer ；感謝 CPLlab 和 EvLab 的人們的評論和幫助。

執照

展開

附加信息

版本 0.1.6
類型其他源碼
更新時間 2024-11-28
大小 133.82KB
來自於 Github

相關應用

waymo open dataset

2024-11-18
SmartTube

2024-12-14
Sunamu

2024-12-14
MySchedule.py

2024-12-15
viptools for eslam

2024-12-15
VITAident

2024-12-15

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
waymo open dataset

其他源碼

December 2023 Update
SmartTube

其他源碼

24.71 Stable
Sunamu

其他源碼

Release 2.2.0
waymo open dataset

其他源碼

December 2023 Update
wp functions

其他類別

1.0.0
termwind

其他類別

v2.3.0

相關資訊全部