surprisal下载 - surprisal源代码下载

surprisal

其他源码

0.1.6

下载

令人惊讶的

从语言模型中计算出惊喜！

surprisal支持 Huggingface 或本地检查点的大多数因果语言模型（类似GPT2和GPTneo的模型），以及使用其 API 的 OpenAI 的GPT3模型！我们还使用 KenLM Python 接口支持基于KenLM N-gram 的语言模型。

屏蔽语言模型（类似BERT的模型）正在开发中，并将在未来得到支持（参见 #9）。

文档

用法

下面的代码片段计算句子列表的每个标记的惊喜

 from surprisal import AutoHuggingFaceModel , KenLMModel

sentences = [
    "The cat is on the mat" ,
    "The cat is on the hat" ,
    "The cat is on the pizza" ,
    "The pizza is on the mat" ,
    "I told you that the cat is on the mat" ,
    "I told you the cat is on the mat" ,
]

m = AutoHuggingFaceModel . from_pretrained ( 'gpt2' )
m . to ( 'cuda' ) # optionally move your model to GPU!

k = KenLMModel ( model_path = './literature.arpa' )

for result in m . surprise ( sentences ):
    print ( result )
for result in k . surprise ( sentences ):
    print ( result )

并产生这种类型的输出（ gpt2 ）：

       The       Ġcat        Ġis        Ġon       Ġthe       Ġmat  
     3.276      9.222      2.463      4.145      0.961      7.237  
       The       Ġcat        Ġis        Ġon       Ġthe       Ġhat  
     3.276      9.222      2.463      4.145      0.961      9.955  
       The       Ġcat        Ġis        Ġon       Ġthe     Ġpizza  
     3.276      9.222      2.463      4.145      0.961      8.212  
       The     Ġpizza        Ġis        Ġon       Ġthe       Ġmat  
     3.276     10.860      3.212      4.910      0.985      8.379  
         I      Ġtold       Ġyou      Ġthat       Ġthe       Ġcat        Ġis        Ġon       Ġthe       Ġmat 
     3.998      6.856      0.619      2.443      2.711      7.955      2.596      4.804      1.139      6.946 
         I      Ġtold       Ġyou       Ġthe       Ġcat        Ġis        Ġon       Ġthe       Ġmat  
     3.998      6.856      0.619      4.115      7.612      3.031      4.817      1.233      7.033

提取子字符串中的惊喜

令人惊讶的对象可以聚合在最匹配单词或字符范围的标记子集上。单词边界继承自模型的标准分词器，并且在模型之间可能不一致，因此在切片时使用字符跨度是默认和推荐的选项。惊喜位于日志空间中，因此在聚合期间添加到令牌上。例如：

 >> > [ s ] = m . surprise ( "The cat is on the mat" )
>> > s [ 3 : 6 , "word" ] 
12.343366384506226
Ġon Ġthe Ġmat
>> > s [ 3 : 6 , "char" ]
9.222099304199219
Ġcat
>> > s [ 3 : 6 ]
9.222099304199219
Ġcat

使用 OpenAI API 的 GPT-3

注意：OpenAI 最近不再返回大多数模型中的对数概率。参见#15。为了使用 OpenAI API 中的 GPT-3 模型，您需要使用您的帐户获取您的组织 ID 和用户特定的 API 密钥。然后，以与 Huggingface 模型相同的方式使用OpenAIModel 。

 m = surprisal . OpenAIModel ( model_id = 'text-davinci-002' ,
                          openai_api_key = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" , 
                          openai_org = "org-xxxxxxxxxxxxxxxxxxxxxxxx" )

这些值也可以在调用脚本之前使用环境变量OPENAI_API_KEY和OPENAI_ORG传递。

您还可以调用Surprisal.lineplot()来可视化惊喜：

 from matplotlib import pyplot as plt
f , a = None , None
for result in m . surprise ( sentences ):
    f , a = result . lineplot ( f , a )

plt . show ()

surprisal还有一个最小的 CLI：

 python - m surprisal - m distilgpt2 "I went to the train station today."
      I      Ġwent        Ġto       Ġthe     Ġtrain   Ġstation     Ġtoday          . 
  4.984      5.729      0.812      1.723      7.317      0.497      4.600      2.528 

python - m surprisal - m distilgpt2 "I went to the space station today."
      I      Ġwent        Ġto       Ġthe     Ġspace   Ġstation     Ġtoday          . 
  4.984      5.729      0.812      1.723      8.425      0.707      5.182      2.574

安装中

由于来自不同社区的人们出于不同的目的使用surprisal ，因此默认情况下，与语言建模相关的核心依赖项被标记为可选。根据您的使用案例，安装带有适当附加功能的surprisal 。

从 PyPI 安装（最新稳定版本）

使用pip install surprisal[optional]之类的命令，将[optional]替换为您需要的任何可选支持。对于多个可选附加项，请使用逗号分隔的列表：

pip install surprisal[kenlm,transformers]
# the above is equivalent to
pip install surprisal[all]

可能的选项包括： transformers 、 kenlm 、 openai 、 petals

如果您在现有项目中使用poetry ，请使用-E选项添加surprisal以及所需的可选依赖项：

poetry add surprisal -E transformers -E kenlm
# the above is equivalent to
poetry add surprisal -E all

要同时安装openai和petals ，你可以这样做

poetry add surprisal -E transformers -E kenlm -E openai -E petals
# the above is equivalent to 
poetry add surprisal -E allplus

从 GitHub 安装（前沿）

-e标志允许可编辑安装，因此您可以对surprisal进行更改。

git clone https://github.com/aalok-sathe/surprisal.git
pip install .[transformers] -e

致谢

灵感来自现已停用的lm-scorer ；感谢来自 CPLlab 和 EvLab 的人们的评论和帮助。

执照

展开

附加信息

版本 0.1.6
类型其他源码
更新时间 2024-11-28
大小 133.82KB
来自于 Github

surprisal

令人惊讶的

文档

用法

提取子字符串中的惊喜

使用 OpenAI API 的 GPT-3

安装中

从 PyPI 安装（最新稳定版本）

从 GitHub 安装（前沿）

致谢

执照

waymo open dataset

SmartTube

Sunamu

MySchedule.py

viptools for eslam

VITAident

chat.petals.dev

GPT Prompt Templates

GPTyped

waymo open dataset

SmartTube

Sunamu

waymo open dataset

wp functions

termwind