ดาวน์โหลด surprisal - ดาวน์โหลดซอร์สโค้ด surprisal

surprisal

ซอร์สโค้ดอื่น ๆ

0.1.6

ดาวน์โหลด

เซอร์ไพรส์

คำนวณความประหลาดใจจากโมเดลภาษา!

surprisal รองรับโมเดลภาษาเชิงสาเหตุส่วนใหญ่ ( GPT2 - และโมเดลที่คล้าย GPTneo ) จาก Huggingface หรือจุดตรวจในพื้นที่ รวมถึงโมเดล GPT3 จาก OpenAI โดยใช้ API ของพวกเขา! นอกจากนี้เรายังรองรับโมเดลภาษาที่ใช้ KenLM N-gram โดยใช้อินเทอร์เฟซ KenLM Python

โมเดลภาษามาสก์ (โมเดลที่คล้าย BERT ) อยู่ในขั้นตอนการผลิตและจะได้รับการสนับสนุนในอนาคต (ดู #9)

เอกสาร

การใช้งาน

ตัวอย่างด้านล่างนี้คำนวณการเกินจริงต่อโทเค็นสำหรับรายการประโยค

 from surprisal import AutoHuggingFaceModel , KenLMModel

sentences = [
    "The cat is on the mat" ,
    "The cat is on the hat" ,
    "The cat is on the pizza" ,
    "The pizza is on the mat" ,
    "I told you that the cat is on the mat" ,
    "I told you the cat is on the mat" ,
]

m = AutoHuggingFaceModel . from_pretrained ( 'gpt2' )
m . to ( 'cuda' ) # optionally move your model to GPU!

k = KenLMModel ( model_path = './literature.arpa' )

for result in m . surprise ( sentences ):
    print ( result )
for result in k . surprise ( sentences ):
    print ( result )

และสร้างเอาต์พุตประเภทนี้ ( gpt2 ):

       The       Ġcat        Ġis        Ġon       Ġthe       Ġmat  
     3.276      9.222      2.463      4.145      0.961      7.237  
       The       Ġcat        Ġis        Ġon       Ġthe       Ġhat  
     3.276      9.222      2.463      4.145      0.961      9.955  
       The       Ġcat        Ġis        Ġon       Ġthe     Ġpizza  
     3.276      9.222      2.463      4.145      0.961      8.212  
       The     Ġpizza        Ġis        Ġon       Ġthe       Ġmat  
     3.276     10.860      3.212      4.910      0.985      8.379  
         I      Ġtold       Ġyou      Ġthat       Ġthe       Ġcat        Ġis        Ġon       Ġthe       Ġmat 
     3.998      6.856      0.619      2.443      2.711      7.955      2.596      4.804      1.139      6.946 
         I      Ġtold       Ġyou       Ġthe       Ġcat        Ġis        Ġon       Ġthe       Ġmat  
     3.998      6.856      0.619      4.115      7.612      3.031      4.817      1.233      7.033

แยกความประหลาดใจออกจากสตริงย่อย

วัตถุเซอร์ไพรส์เซอร์สามารถนำมารวมกันบนชุดย่อยของโทเค็นที่ตรงกับช่วงคำหรือตัวอักษรมากที่สุด ขอบเขตของคำสืบทอดมาจากโทเค็นไนเซอร์มาตรฐานของโมเดล และอาจไม่สอดคล้องกันในโมเดลต่างๆ ดังนั้นการใช้การขยายอักขระเมื่อแบ่งส่วนถือเป็นตัวเลือกเริ่มต้นและแนะนำ การเซอร์ไพรส์อยู่ในพื้นที่บันทึก ดังนั้นจึงถูกเพิ่มไว้บนโทเค็นระหว่างการรวมกลุ่ม ตัวอย่างเช่น:

 >> > [ s ] = m . surprise ( "The cat is on the mat" )
>> > s [ 3 : 6 , "word" ] 
12.343366384506226
Ġon Ġthe Ġmat
>> > s [ 3 : 6 , "char" ]
9.222099304199219
Ġcat
>> > s [ 3 : 6 ]
9.222099304199219
Ġcat

GPT-3 โดยใช้ OpenAI API

หมายเหตุ: OpenAI จะไม่ส่งคืนความน่าจะเป็นของบันทึกในโมเดลส่วนใหญ่อีกต่อไป ดู #15. หากต้องการใช้โมเดล GPT-3 จาก API ของ OpenAI คุณจะต้องได้รับรหัสองค์กรและคีย์ API เฉพาะผู้ใช้โดยใช้บัญชีของคุณ จากนั้นใช้ OpenAIModel ในลักษณะเดียวกับโมเดล Huggingface

 m = surprisal . OpenAIModel ( model_id = 'text-davinci-002' ,
                          openai_api_key = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" , 
                          openai_org = "org-xxxxxxxxxxxxxxxxxxxxxxxx" )

ค่าเหล่านี้สามารถส่งผ่านได้โดยใช้ตัวแปรสภาพแวดล้อม OPENAI_API_KEY และ OPENAI_ORG ก่อนที่จะเรียกใช้สคริปต์

คุณยังสามารถเรียก Surprisal.lineplot() เพื่อแสดงภาพความประหลาดใจได้:

 from matplotlib import pyplot as plt
f , a = None , None
for result in m . surprise ( sentences ):
    f , a = result . lineplot ( f , a )

plt . show ()

surprisal ยังมี CLI ขั้นต่ำด้วย:

 python - m surprisal - m distilgpt2 "I went to the train station today."
      I      Ġwent        Ġto       Ġthe     Ġtrain   Ġstation     Ġtoday          . 
  4.984      5.729      0.812      1.723      7.317      0.497      4.600      2.528 

python - m surprisal - m distilgpt2 "I went to the space station today."
      I      Ġwent        Ġto       Ġthe     Ġspace   Ġstation     Ġtoday          . 
  4.984      5.729      0.812      1.723      8.425      0.707      5.182      2.574

กำลังติดตั้ง

เนื่องจากผู้คนจากชุมชนต่างๆ จะใช้ surprisal เพื่อวัตถุประสงค์ที่แตกต่างกัน ตามค่าเริ่มต้น การขึ้นต่อกันหลักที่เกี่ยวข้องกับการสร้างแบบจำลองภาษาจะถูกทำเครื่องหมายว่าเป็นทางเลือก ติดตั้ง surprisal พร้อมส่วนเสริมที่เหมาะสม ทั้งนี้ขึ้นอยู่กับกรณีการใช้งานของคุณ

การติดตั้งจาก PyPI (รีลีสเสถียรล่าสุด)

ใช้คำสั่งเช่น pip install surprisal[optional] แทนที่ [optional] ด้วยการสนับสนุนเพิ่มเติมที่คุณต้องการ สำหรับรายการเสริมเพิ่มเติมหลายรายการ ให้ใช้รายการที่คั่นด้วยเครื่องหมายจุลภาค:

pip install surprisal[kenlm,transformers]
# the above is equivalent to
pip install surprisal[all]

ตัวเลือกที่เป็นไปได้ ได้แก่: transformers , kenlm , openai , petals

หากคุณใช้ poetry สำหรับโปรเจ็กต์ที่มีอยู่ ให้ใช้ตัวเลือก -E เพื่อเพิ่ม surprisal พร้อมกับการอ้างอิงเพิ่มเติมที่ต้องการ:

poetry add surprisal -E transformers -E kenlm
# the above is equivalent to
poetry add surprisal -E all

หากต้องการติดตั้ง openai และ petals คุณสามารถทำได้

poetry add surprisal -E transformers -E kenlm -E openai -E petals
# the above is equivalent to 
poetry add surprisal -E allplus

การติดตั้งจาก GitHub (ขอบตกเลือด)

แฟล็ก -e อนุญาตให้ติดตั้งที่แก้ไขได้ ดังนั้นคุณจึงสามารถเปลี่ยนแปลง surprisal ได้

git clone https://github.com/aalok-sathe/surprisal.git
pip install .[transformers] -e

รับทราบ

แรงบันดาลใจจาก lm-scorer ที่ไม่ได้ใช้งานในขณะนี้ ขอขอบคุณบุคลากรจาก CPLlab และ EvLab สำหรับความคิดเห็นและความช่วยเหลือ

ใบอนุญาต

ขยาย

ข้อมูลเพิ่มเติม

เวอร์ชัน 0.1.6
ประเภท ซอร์สโค้ดอื่น ๆ
เวลาอัปเดต 2024-11-28
ขนาด 133.82KB
มาจาก Github

แอปที่เกี่ยวข้อง

waymo open dataset

2024-11-18
SmartTube

2024-12-14
Sunamu

2024-12-14
MySchedule.py

2024-12-15
viptools for eslam

2024-12-15
VITAident

2024-12-15

แนะนำสำหรับคุณ

chat.petals.dev

ซอร์สโค้ดอื่น ๆ

1.0.0
GPT Prompt Templates

ซอร์สโค้ดอื่น ๆ

1.0.0
GPTyped

ซอร์สโค้ดอื่น ๆ

GPTyped 1.0.5
waymo open dataset

ซอร์สโค้ดอื่น ๆ

December 2023 Update
SmartTube

ซอร์สโค้ดอื่น ๆ

24.71 Stable
Sunamu

ซอร์สโค้ดอื่น ๆ

Release 2.2.0
waymo open dataset

ซอร์สโค้ดอื่น ๆ

December 2023 Update
wp functions

หมวดหมู่อื่นๆ

1.0.0
termwind

หมวดหมู่อื่นๆ

v2.3.0

ข้อมูลที่เกี่ยวข้อง ทั้งหมด