rust bert ดาวน์โหลด - rust bert ดาวน์โหลดซอร์สโค้ด

rust bert

โค้ดแหล่งที่มา AI

1.0.0

ดาวน์โหลด

สนิมเบิร์ต

สถานะการสร้าง

โมเดลและไปป์ไลน์การประมวลผลภาษาธรรมชาติที่ล้ำสมัยแบบ Rust-native ไลบรารี Transformers ของ Port of Hugging Face โดยใช้การเชื่อมโยง tch-rs หรือ onnxruntime และการประมวลผลล่วงหน้าจากโทเค็นสนิม รองรับโทเค็นแบบมัลติเธรดและการอนุมาน GPU พื้นที่เก็บข้อมูลนี้จะเปิดเผยสถาปัตยกรรมฐานโมเดล ส่วนหัวเฉพาะงาน (ดูด้านล่าง) และไปป์ไลน์ที่พร้อมใช้งาน เกณฑ์มาตรฐานมีอยู่ที่ส่วนท้ายของเอกสารนี้

เริ่มต้นใช้งานต่างๆ รวมถึงการตอบคำถาม การจดจำเอนทิตีที่มีชื่อ การแปล การสรุป การสร้างข้อความ เจ้าหน้าที่การสนทนา และอื่นๆ อีกมากมายด้วยโค้ดเพียงไม่กี่บรรทัด:

    let qa_model = QuestionAnsweringModel :: new ( Default :: default ( ) ) ? ;

let question = String :: from ( "Where does Amy live ?" ) ;
let context = String :: from ( "Amy lives in Amsterdam" ) ;

let answers = qa_model . predict ( & [ QaInput { question , context } ] , 1 , 32 ) ;

เอาท์พุท:

 [Answer { score: 0.9976, start: 13, end: 21, answer: "Amsterdam" }]

งานที่ได้รับการสนับสนุนในปัจจุบัน ได้แก่ :

การแปล
การสรุป
บทสนทนาหลายรอบ
การจำแนกประเภทซีโร่ช็อต
การวิเคราะห์ความรู้สึก
การรับรู้เอนทิตีที่มีชื่อ
ส่วนหนึ่งของการแท็กคำพูด
คำถาม-คำตอบ
การสร้างภาษา
แบบจำลองภาษามาสก์
การฝังประโยค
การสกัดคำหลัก

ขยายเพื่อแสดงโมเดล/เมทริกซ์งานที่รองรับ

	การจำแนกลำดับ	การจำแนกประเภทโทเค็น	ตอบคำถาม	การสร้างข้อความ	การสรุป	การแปล	สวมหน้ากาก LM	การฝังประโยค
ดิสทิลเบิร์ต
โมบายเบิร์ต
เดเบอร์ต้า
เดแบร์ต้า (v2)
เอฟเน็ต
เบิร์ต
โรเบอร์ต้า
GPT
GPT2
GPT-นีโอ
GPT-เจ
บาร์ต
แมเรียน
เอ็มบาร์ต
M2M100
เอ็นแอลบี
อีเล็กตร้า
อัลเบิร์ต
T5
ลองT5
XLNet
นักปฏิรูป
ศาสดาNet
อดีต
เพกาซัส

เริ่มต้นใช้งาน

ไลบรารีนี้อาศัย tch crate สำหรับการเชื่อมโยงกับ C++ Libtorch API จำเป็นต้องมีไลบรารี libtorch ซึ่งสามารถดาวน์โหลดได้โดยอัตโนมัติหรือด้วยตนเอง ข้อมูลต่อไปนี้เป็นข้อมูลอ้างอิงเกี่ยวกับวิธีการตั้งค่าสภาพแวดล้อมของคุณเพื่อใช้การเชื่อมโยงเหล่านี้ โปรดดูที่ tch สำหรับข้อมูลโดยละเอียดหรือการสนับสนุน

นอกจากนี้ ไลบรารีนี้ยังต้องใช้โฟลเดอร์แคชในการดาวน์โหลดโมเดลที่ได้รับการฝึกล่วงหน้า ตำแหน่งแคชนี้มีค่าเริ่มต้นเป็น ~/.cache/.rustbert แต่สามารถเปลี่ยนแปลงได้โดยการตั้งค่าตัวแปรสภาพแวดล้อม RUSTBERT_CACHE โปรดทราบว่าโมเดลภาษาที่ใช้โดยไลบรารีนี้อยู่ในลำดับ 100 จาก MB ถึง GB

การติดตั้งด้วยตนเอง (แนะนำ)

ดาวน์โหลด libtorch จากhttps://pytorch.org/get-started/locally/ แพ็คเกจนี้ต้องการ v2.4 : หากไม่มีเวอร์ชันนี้ในหน้า "เริ่มต้นใช้งาน" อีกต่อไป ไฟล์ควรสามารถเข้าถึงได้โดยการแก้ไขลิงก์เป้าหมาย เช่น https://download.pytorch.org/libtorch/cu124/libtorch-cxx11-abi-shared-with-deps-2.4.0%2Bcu124.zip สำหรับเวอร์ชัน Linux ที่มี CUDA12 หมายเหตุ: เมื่อใช้ rust-bert เป็นการพึ่งพาจาก crates.io โปรดตรวจสอบ LIBTORCH ที่จำเป็นใน readme แพ็คเกจที่เผยแพร่ เนื่องจากอาจแตกต่างจากเวอร์ชันที่จัดทำเอกสารไว้ที่นี่ (ใช้กับเวอร์ชันพื้นที่เก็บข้อมูลปัจจุบัน)
แยกไลบรารีไปยังตำแหน่งที่คุณเลือก
ตั้งค่าตัวแปรสภาพแวดล้อมต่อไปนี้

ลินุกซ์:

 export LIBTORCH=/path/to/libtorch
export LD_LIBRARY_PATH= ${LIBTORCH} /lib: $LD_LIBRARY_PATH

หน้าต่าง

 $ Env: LIBTORCH = " X:pathtolibtorch "
$ Env: Path += " ;X:pathtolibtorchlib "

macOS + โฮมบรูว์

brew install pytorch jq
export LIBTORCH= $( brew --cellar pytorch ) / $( brew info --json pytorch | jq -r ' .[0].installed[0].version ' )
export LD_LIBRARY_PATH= ${LIBTORCH} /lib: $LD_LIBRARY_PATH

การติดตั้งอัตโนมัติ

หรือคุณสามารถปล่อยให้สคริปต์ build ดาวน์โหลดไลบรารี libtorch ให้คุณได้โดยอัตโนมัติ ต้องเปิดใช้งานการตั้งค่าสถานะคุณสมบัติ download-libtorch libtorch เวอร์ชัน CPU จะถูกดาวน์โหลดตามค่าเริ่มต้น หากต้องการดาวน์โหลดเวอร์ชัน CUDA โปรดตั้งค่าตัวแปรสภาพแวดล้อม TORCH_CUDA_VERSION เป็น cu124 โปรดทราบว่าไลบรารี libtorch มีขนาดใหญ่ (ตามลำดับหลาย GB สำหรับเวอร์ชันที่เปิดใช้งาน CUDA) และบิวด์แรกอาจใช้เวลาหลายนาทีจึงจะเสร็จสมบูรณ์

กำลังตรวจสอบการติดตั้ง

ตรวจสอบการติดตั้งของคุณ (และเชื่อมโยงกับ libtorch) โดยการเพิ่มการพึ่งพาของ rust-bert ให้กับ Cargo.toml ของคุณ หรือโดยการโคลนแหล่งที่มาของrust-bert และเรียกใช้ตัวอย่าง:

git clone [email protected]:guillaume-be/rust-bert.git
cd rust-bert
cargo run --example sentence_embeddings

การสนับสนุน ONNX (ทางเลือก)

สามารถเปิดใช้งานการสนับสนุน ONNX ได้ผ่านคุณสมบัติเสริม onnx ลังนี้จะใช้ประโยชน์จากลัง ort ด้วยการเชื่อมโยงกับไลบรารี onnxruntime C++ เราแนะนำผู้ใช้ไปยังโปรเจ็กต์หน้านี้เพื่อรับคำแนะนำ/การสนับสนุนในการติดตั้งเพิ่มเติม

เปิดใช้งานคุณสมบัติเสริม onnx ลัง rust-bert ไม่มีการพึ่งพาทางเลือกใด ๆ สำหรับ ort ผู้ใช้ควรเลือกชุดคุณสมบัติที่เพียงพอสำหรับการดึงไลบรารี onnxruntime C++ ที่จำเป็น
การติดตั้งที่แนะนำในปัจจุบันคือการใช้การลิงก์แบบไดนามิกโดยชี้ไปที่ตำแหน่งไลบรารีที่มีอยู่ ใช้คุณลักษณะ load-dynamic คาร์โก้สำหรับ ort
ตั้งค่า ORT_DYLIB_PATH ให้ชี้ไปยังตำแหน่งของไลบรารี onnxruntime ที่ดาวน์โหลด ( onnxruntime.dll / libonnxruntime.so / libonnxruntime.dylib ขึ้นอยู่กับระบบปฏิบัติการ) สามารถดาวน์โหลดได้จากเพจรีลีสของโปรเจ็กต์ onnxruntime

รองรับสถาปัตยกรรมส่วนใหญ่ (รวมถึงตัวเข้ารหัส ตัวถอดรหัส และตัวถอดรหัสตัวเข้ารหัส) ไลบรารีมีจุดมุ่งหมายเพื่อรักษาความเข้ากันได้กับโมเดลที่ส่งออกโดยใช้ไลบรารี Optimum คำแนะนำโดยละเอียดเกี่ยวกับวิธีการส่งออกโมเดล Transformer ไปยัง ONNX โดยใช้ Optimum มีอยู่ที่ https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/export_a_model ทรัพยากรที่ใช้ในการสร้างโมเดล ONNX นั้นคล้ายคลึงกับ ที่ใช้ Pytorch โดยแทนที่ pytorch ด้วยโมเดล ONNX เนื่องจากโมเดล ONNX มีความยืดหยุ่นน้อยกว่าโมเดล Pytorch ในการจัดการอาร์กิวเมนต์เสริม การส่งออกตัวถอดรหัสหรือโมเดลตัวเข้ารหัส-ตัวถอดรหัสไปยัง ONNX มักจะส่งผลให้มีไฟล์หลายไฟล์ ไฟล์เหล่านี้คาดว่าจะมี (แต่ไม่จำเป็นทั้งหมด) เพื่อใช้ในไลบรารีนี้ตามตารางด้านล่าง:

สถาปัตยกรรม	ไฟล์ตัวเข้ารหัส	ตัวถอดรหัสไม่มีไฟล์ที่ผ่านมา	ตัวถอดรหัสพร้อมไฟล์ที่ผ่านมา
ตัวเข้ารหัส (เช่น BERT)	ที่จำเป็น	ไม่ได้ใช้	ไม่ได้ใช้
ตัวถอดรหัส (เช่น GPT2)	ไม่ได้ใช้	ที่จำเป็น	ไม่จำเป็น
ตัวเข้ารหัส-ตัวถอดรหัส (เช่น BART)	ที่จำเป็น	ที่จำเป็น	ไม่จำเป็น

โปรดทราบว่าประสิทธิภาพการคำนวณจะลดลงเมื่อ decoder with past เป็นทางเลือก แต่ไม่ได้จัดเตรียมไว้ เนื่องจากแบบจำลองจะไม่ใช้คีย์และค่าที่ผ่านมาที่แคชไว้สำหรับกลไกความสนใจ ซึ่งนำไปสู่การคำนวณซ้ำซ้อนจำนวนมาก ไลบรารี Optimum เสนอตัวเลือกในการส่งออกเพื่อให้แน่ใจว่า decoder with past จะถูกสร้างขึ้น สถาปัตยกรรมโมเดลตัวเข้ารหัสและตัวถอดรหัสพื้นฐานนั้นมีให้ใช้งาน (และเปิดเผยเพื่อความสะดวก) ในโมดูล encoder และ decoder ตามลำดับ

โมเดลการสร้าง (ตัวถอดรหัสล้วนๆ หรือสถาปัตยกรรมตัวเข้ารหัส/ตัวถอดรหัส) มีอยู่ในโมดูล models ไปป์ไลน์ ost พร้อมใช้งานสำหรับจุดตรวจสอบโมเดล ONNX ซึ่งรวมถึงการจำแนกลำดับ การจำแนกแบบ Zero-shot การจำแนกโทเค็น (รวมถึงการจดจำเอนทิตีที่มีชื่อและการแท็กส่วนของคำพูด) การตอบคำถาม การสร้างข้อความ การสรุป และการแปล โมเดลเหล่านี้ใช้ไฟล์การกำหนดค่าและโทเค็นเดียวกันกับไฟล์ Pytorch เมื่อใช้ในไปป์ไลน์ ตัวอย่างที่ใช้ประโยชน์จากโมเดล ONNX มีระบุไว้ในไดเร็กทอรี ./examples

ท่อพร้อมใช้

จากไปป์ไลน์ของ Hugging Face ไปป์ไลน์ NLP แบบ end-to-end ที่พร้อมใช้งานเป็นส่วนหนึ่งของลังนี้ ความสามารถต่อไปนี้มีอยู่ในปัจจุบัน:

ข้อจำกัดความรับผิดชอบ ผู้ร่วมให้ข้อมูลพื้นที่เก็บข้อมูลนี้ไม่รับผิดชอบต่อรุ่นใดๆ จากการใช้งานระบบที่ได้รับการฝึกอบรมโดยบุคคลที่สามที่เสนอในที่นี้

1. การตอบคำถาม

คำถามที่แยกออกมาจากคำถามและบริบทที่กำหนด โมเดล DistilBERT ได้รับการปรับแต่งอย่างละเอียดบน SQuAD (ชุดข้อมูลการตอบคำถามของ Stanford)

    let qa_model = QuestionAnsweringModel :: new ( Default :: default ( ) ) ? ;

let question = String :: from ( "Where does Amy live ?" ) ;
let context = String :: from ( "Amy lives in Amsterdam" ) ;

let answers = qa_model . predict ( & [ QaInput { question , context } ] , 1 , 32 ) ;

เอาท์พุท:

 [Answer { score: 0.9976, start: 13, end: 21, answer: "Amsterdam" }]

2. การแปล

ไปป์ไลน์การแปลที่รองรับภาษาต้นทางและภาษาเป้าหมายที่หลากหลาย ใช้ประโยชน์จากสถาปัตยกรรมหลักสองประการสำหรับงานแปล:

โมเดลที่ใช้ Marian สำหรับการรวมแหล่งที่มา/เป้าหมายเฉพาะ
รุ่น M2M100 ช่วยให้สามารถแปลได้โดยตรงระหว่าง 100 ภาษา (ด้วยต้นทุนการคำนวณที่สูงขึ้นและประสิทธิภาพที่ต่ำกว่าสำหรับบางภาษาที่เลือก)

โมเดลที่ได้รับการฝึกล่วงหน้าตาม Marian สำหรับคู่ภาษาต่อไปนี้มีอยู่ในไลบรารี - แต่ผู้ใช้สามารถนำเข้าโมเดลที่ใช้ Pytorch ใด ๆ เพื่อการทำนายได้

อังกฤษ <-> ฝรั่งเศส
อังกฤษ <-> สเปน
อังกฤษ <-> โปรตุเกส
อังกฤษ <-> อิตาลี
อังกฤษ <-> คาตาลัน
อังกฤษ <-> เยอรมัน
อังกฤษ <-> รัสเซีย
อังกฤษ <-> จีน
อังกฤษ <-> ดัตช์
อังกฤษ <-> สวีเดน
ภาษาอังกฤษ <-> ภาษาอาหรับ
อังกฤษ <-> ฮิบรู
อังกฤษ <-> ฮินดี
ฝรั่งเศส <-> เยอรมัน

สำหรับภาษาที่ไม่รองรับโดยโมเดล Marian ที่ได้รับการฝึกล่วงหน้าที่เสนอ ผู้ใช้สามารถใช้ประโยชน์จากโมเดล M2M100 ที่รองรับการแปลโดยตรงระหว่าง 100 ภาษา (ไม่มีการแปลภาษาอังกฤษระดับกลาง) รายการภาษาที่รองรับทั้งหมดมีอยู่ในเอกสารประกอบของลัง

 use rust_bert :: pipelines :: translation :: { Language , TranslationModelBuilder } ;
fn main ( ) -> anyhow :: Result < ( ) > {
    let model = TranslationModelBuilder :: new ( )
        . with_source_languages ( vec ! [ Language :: English ] )
        . with_target_languages ( vec ! [ Language :: Spanish , Language :: French , Language :: Italian ] )
        . create_model ( ) ? ;
    let input_text = "This is a sentence to be translated" ;
    let output = model . translate ( & [ input_text ] , None , Language :: French ) ? ;
    for sentence in output {
        println ! ( "{}" , sentence ) ;
    }
    Ok ( ( ) )
}

เอาท์พุท:

 Il s'agit d'une phrase à traduire

3. การสรุป

การสรุปเชิงนามธรรมโดยใช้แบบจำลอง BART ที่ผ่านการฝึกอบรมมาแล้ว

    let summarization_model = SummarizationModel :: new ( Default :: default ( ) ) ? ;

let input = [ "In findings published Tuesday in Cornell University's arXiv by a team of scientists 
from the University of Montreal and a separate report published Wednesday in Nature Astronomy by a team 
from University College London (UCL), the presence of water vapour was confirmed in the atmosphere of K2-18b, 
a planet circling a star in the constellation Leo. This is the first such discovery in a planet in its star's 
habitable zone — not too hot and not too cold for liquid water to exist. The Montreal team, led by Björn Benneke, 
used data from the NASA's Hubble telescope to assess changes in the light coming from K2-18b's star as the planet 
passed between it and Earth. They found that certain wavelengths of light, which are usually absorbed by water, 
weakened when the planet was in the way, indicating not only does K2-18b have an atmosphere, but the atmosphere 
contains water in vapour form. The team from UCL then analyzed the Montreal team's data using their own software 
and confirmed their conclusion. This was not the first time scientists have found signs of water on an exoplanet, 
but previous discoveries were made on planets with high temperatures or other pronounced differences from Earth. 
" This is the first potentially habitable planet where the temperature is right and where we now know there is water, " 
said UCL astronomer Angelos Tsiaras. " It's the best candidate for habitability right now. " " It's a good sign " , 
said Ryan Cloutier of the Harvard–Smithsonian Center for Astrophysics, who was not one of either study's authors. 
" Overall, " he continued, " the presence of water in its atmosphere certainly improves the prospect of K2-18b being 
a potentially habitable planet, but further observations will be required to say for sure. "
K2-18b was first identified in 2015 by the Kepler space telescope. It is about 110 light-years from Earth and larger 
but less dense. Its star, a red dwarf, is cooler than the Sun, but the planet's orbit is much closer, such that a year 
on K2-18b lasts 33 Earth days. According to The Guardian, astronomers were optimistic that NASA's James Webb space 
telescope — scheduled for launch in 2021 — and the European Space Agency's 2028 ARIEL program, could reveal more 
about exoplanets like K2-18b." ] ;

let output = summarization_model . summarize ( & input ) ;

(ตัวอย่างจาก: วิกินิวส์)

เอาท์พุท:

 "Scientists have found water vapour on K2-18b, a planet 110 light-years from Earth. 
This is the first such discovery in a planet in its star's habitable zone. 
The planet is not too hot and not too cold for liquid water to exist."

4. รูปแบบการเจรจา

รูปแบบการสนทนาตาม DialoGPT ของ Microsoft ไปป์ไลน์นี้ช่วยให้สามารถสร้างการสนทนาแบบเลี้ยวเดียวหรือหลายรอบระหว่างมนุษย์กับแบบจำลองได้ หน้าของ DialoGPT ระบุว่า

ผลการประเมินโดยมนุษย์ระบุว่าการตอบสนองที่สร้างจาก DialoGPT เทียบได้กับคุณภาพการตอบสนองของมนุษย์ภายใต้การทดสอบทัวริงการสนทนาแบบเลี้ยวเดียว (ที่เก็บ DialoGPT)

โมเดลนี้ใช้ ConversationManager เพื่อติดตามการสนทนาที่กำลังดำเนินอยู่และสร้างการตอบกลับ

 use rust_bert :: pipelines :: conversation :: { ConversationModel , ConversationManager } ;

let conversation_model = ConversationModel :: new ( Default :: default ( ) ) ;
let mut conversation_manager = ConversationManager :: new ( ) ;

let conversation_id = conversation_manager . create ( "Going to the movies tonight - any suggestions?" ) ;
let output = conversation_model . generate_responses ( & mut conversation_manager ) ;

ตัวอย่างผลลัพธ์:

 "The Big Lebowski."

5. การสร้างภาษาธรรมชาติ

สร้างภาษาตามพร้อมท์ GPT2 และ GPT มีจำหน่ายเป็นรุ่นพื้นฐาน รวมเทคนิคต่างๆ เช่น การค้นหาลำแสง การสุ่มตัวอย่าง top-k และนิวเคลียส การตั้งค่าอุณหภูมิ และการปรับโทษการทำซ้ำ รองรับการสร้างประโยคเป็นชุดจากการแจ้งเตือนหลายรายการ ลำดับจะถูกบุด้านซ้ายด้วยโทเค็นการเสริมของโมเดล หากมีอยู่ มิฉะนั้น โทเค็นที่ไม่รู้จัก ซึ่งอาจส่งผลต่อผลลัพธ์ ขอแนะนำให้ส่งพร้อมท์ที่มีความยาวใกล้เคียงกันเพื่อให้ได้ผลลัพธ์ที่ดีที่สุด

    let model = GPT2Generator :: new ( Default :: default ( ) ) ? ;

let input_context_1 = "The dog" ;
let input_context_2 = "The cat was" ;

let generate_options = GenerateOptions {
max_length : 30 ,
.. Default :: default ( )
} ;

let output = model . generate ( Some ( & [ input_context_1 , input_context_2 ] ) , generate_options ) ;

ตัวอย่างผลลัพธ์:

 [
    "The dog's owners, however, did not want to be named. According to the lawsuit, the animal's owner, a 29-year"
    "The dog has always been part of the family. "He was always going to be my dog and he was always looking out for me"
    "The dog has been able to stay in the home for more than three months now. "It's a very good dog. She's"
    "The cat was discovered earlier this month in the home of a relative of the deceased. The cat's owner, who wished to remain anonymous,"
    "The cat was pulled from the street by two-year-old Jazmine."I didn't know what to do," she said"
    "The cat was attacked by two stray dogs and was taken to a hospital. Two other cats were also injured in the attack and are being treated."
]

6. การจำแนกประเภทซีโร่ช็อต

ดำเนินการจัดหมวดหมู่แบบ Zero-Shot ในประโยคอินพุตพร้อมป้ายกำกับที่ให้มาโดยใช้โมเดลที่ได้รับการปรับแต่งอย่างละเอียดสำหรับการอนุมานภาษาธรรมชาติ

    let sequence_classification_model = ZeroShotClassificationModel :: new ( Default :: default ( ) ) ? ;

let input_sentence = "Who are you voting for in 2020?" ;
let input_sequence_2 = "The prime minister has announced a stimulus package which was widely criticized by the opposition." ;
let candidate_labels = & [ "politics" , "public health" , "economics" , "sports" ] ;

let output = sequence_classification_model . predict_multilabel (
& [ input_sentence , input_sequence_2 ] ,
candidate_labels ,
None ,
128 ,
) ;

เอาท์พุท:

 [
  [ Label { "politics", score: 0.972 }, Label { "public health", score: 0.032 }, Label {"economics", score: 0.006 }, Label {"sports", score: 0.004 } ],
  [ Label { "politics", score: 0.975 }, Label { "public health", score: 0.0818 }, Label {"economics", score: 0.852 }, Label {"sports", score: 0.001 } ],
]

7. การวิเคราะห์ความรู้สึก

ทำนายความรู้สึกแบบไบนารีของประโยค โมเดล DitilBERT ได้รับการปรับแต่งอย่างละเอียดบน SST-2

    let sentiment_classifier = SentimentModel :: new ( Default :: default ( ) ) ? ;

let input = [
"Probably my all-time favorite movie, a story of selflessness, sacrifice and dedication to a noble cause, but it's not preachy or boring." ,
"This film tried to be too many things all at once: stinging political satire, Hollywood blockbuster, sappy romantic comedy, family values promo..." ,
"If you like original gut wrenching laughter you will like this movie. If you are young or old then you will love this movie, hell even my mom liked it." ,
] ;

let output = sentiment_classifier . predict ( & input ) ;

(ตัวอย่างความอนุเคราะห์จาก IMDb)

เอาท์พุท:

 [
    Sentiment { polarity: Positive, score: 0.9981985493795946 },
    Sentiment { polarity: Negative, score: 0.9927982091903687 },
    Sentiment { polarity: Positive, score: 0.9997248985164333 }
]

8. การรับรู้ชื่อนิติบุคคล

แยกเอนทิตี (บุคคล สถานที่ องค์กร เบ็ดเตล็ด) ออกจากข้อความ BERT ใส่โมเดลขนาดใหญ่ที่ได้รับการปรับแต่งอย่างละเอียดใน ConNNL03 ซึ่งสนับสนุนโดยทีมงาน MDZ Digital Library ที่ Bavarian State Library ขณะนี้โมเดลมีให้บริการในภาษาอังกฤษ เยอรมัน สเปน และดัตช์

    let ner_model = NERModel :: new ( default :: default ( ) ) ? ;

let input = [
"My name is Amy. I live in Paris." ,
"Paris is a city in France."
] ;

let output = ner_model . predict ( & input ) ;

เอาท์พุท:

 [
  [
    Entity { word: "Amy", score: 0.9986, label: "I-PER" }
    Entity { word: "Paris", score: 0.9985, label: "I-LOC" }
  ],
  [
    Entity { word: "Paris", score: 0.9988, label: "I-LOC" }
    Entity { word: "France", score: 0.9993, label: "I-LOC" }
  ]
]

9. การดึงคำสำคัญ/วลีสำคัญ

แยกคีย์เวิร์ดและคีย์วลีออกจากเอกสารอินพุต

 fn main ( ) -> anyhow :: Result < ( ) > {
    let keyword_extraction_model = KeywordExtractionModel :: new ( Default :: default ( ) ) ? ;

    let input = "Rust is a multi-paradigm, general-purpose programming language. 
       Rust emphasizes performance, type safety, and concurrency. Rust enforces memory safety—that is, 
       that all references point to valid memory—without requiring the use of a garbage collector or 
       reference counting present in other memory-safe languages. To simultaneously enforce 
       memory safety and prevent concurrent data races, Rust's borrow checker tracks the object lifetime 
       and variable scope of all references in a program during compilation. Rust is popular for 
       systems programming but also offers high-level features including functional programming constructs." ;

    let output = keyword_extraction_model . predict ( & [ input ] ) ? ;
}

เอาท์พุท:

 "rust" - 0.50910604
"programming" - 0.35731024
"concurrency" - 0.33825397
"concurrent" - 0.31229728
"program" - 0.29115444

10. ส่วนหนึ่งของการแท็กคำพูด

แยกแท็กส่วนของคำพูด (คำนาม กริยา คำคุณศัพท์...) ออกจากข้อความ

    let pos_model = POSModel :: new ( default :: default ( ) ) ? ;

let input = [ "My name is Bob" ] ;

let output = pos_model . predict ( & input ) ;

เอาท์พุท:

 [
    Entity { word: "My", score: 0.1560, label: "PRP" }
    Entity { word: "name", score: 0.6565, label: "NN" }
    Entity { word: "is", score: 0.3697, label: "VBZ" }
    Entity { word: "Bob", score: 0.7460, label: "NNP" }
]

11. การฝังประโยค

สร้างการฝังประโยค (การแสดงเวกเตอร์) สิ่งเหล่านี้สามารถใช้กับแอปพลิเคชันต่างๆ รวมถึงการดึงข้อมูลที่มีความหนาแน่นสูง

    let model = SentenceEmbeddingsBuilder :: remote (
SentenceEmbeddingsModelType :: AllMiniLmL12V2
) . create_model ( ) ? ;

let sentences = [
"this is an example sentence" ,
"each sentence is converted"
] ;

let output = model . encode ( & sentences ) ? ;

เอาท์พุท:

 [
    [-0.000202666, 0.08148022, 0.03136178, 0.002920636 ...],
    [0.064757116, 0.048519745, -0.01786038, -0.0479775 ...]
]

12. โมเดลภาษามาสก์

ทำนายคำที่สวมหน้ากากในประโยคอินพุต

    let model = MaskedLanguageModel :: new ( Default :: default ( ) ) ? ;

let sentences = [
"Hello I am a <mask> student" ,
"Paris is the <mask> of France. It is <mask> in Europe." ,
] ;

let output = model . predict ( & sentences ) ;

เอาท์พุท:

 [
    [MaskedToken { text: "college", id: 2267, score: 8.091}],
    [
        MaskedToken { text: "capital", id: 3007, score: 16.7249}, 
        MaskedToken { text: "located", id: 2284, score: 9.0452}
    ]
]

เกณฑ์มาตรฐาน

สำหรับไปป์ไลน์แบบง่าย (การจำแนกลำดับ การจำแนกโทเค็น การตอบคำถาม) ประสิทธิภาพระหว่าง Python และ Rust คาดว่าจะเทียบเคียงได้ เนื่องจากส่วนที่แพงที่สุดของไปป์ไลน์เหล่านี้คือโมเดลภาษา ซึ่งมีการใช้งานร่วมกันในแบ็กเอนด์ของ Torch ไปป์ไลน์ NLP แบบ End-to-end ใน Rust มีส่วนการวัดประสิทธิภาพที่ครอบคลุมไปป์ไลน์ทั้งหมด

สำหรับงานสร้างข้อความ (การสรุป การแปล การสนทนา การสร้างข้อความอิสระ) สามารถคาดหวังประโยชน์ที่สำคัญได้ (การประมวลผลเร็วขึ้นสูงสุด 2 ถึง 4 เท่า ขึ้นอยู่กับอินพุตและแอปพลิเคชัน) บทความ การเร่งการสร้างข้อความด้วย Rust มุ่งเน้นไปที่แอปพลิเคชันการสร้างข้อความเหล่านี้ และให้รายละเอียดเพิ่มเติมเกี่ยวกับการเปรียบเทียบประสิทธิภาพกับ Python

กำลังโหลดตุ้มน้ำหนักโมเดลที่ฝึกไว้ล่วงหน้าและแบบกำหนดเอง

โมเดลพื้นฐานและหัวเฉพาะงานยังมีให้สำหรับผู้ใช้ที่ต้องการเปิดตัวโมเดลที่ใช้หม้อแปลงของตนเอง ตัวอย่างวิธีเตรียมวันที่โดยใช้ไลบรารี Rust โทเค็นเนทีฟดั้งเดิมมีอยู่ใน ./examples examples สำหรับ BERT, DistilBERT, RoBERTa, GPT, GPT2 และ BART โปรดทราบว่าเมื่อนำเข้าโมเดลจาก Pytorch หลักการตั้งชื่อพารามิเตอร์จะต้องสอดคล้องกับสคีมาของ Rust การโหลดตุ้มน้ำหนักที่ได้รับการฝึกล่วงหน้าจะล้มเหลวหากไม่พบน้ำหนักพารามิเตอร์แบบจำลองใดๆ ในไฟล์น้ำหนัก หากจะข้ามการตรวจสอบคุณภาพนี้ คุณสามารถเรียกใช้เมธอดอื่น load_partial จากที่เก็บตัวแปรได้

โมเดลที่ได้รับการฝึกไว้ล่วงหน้ามีอยู่ในศูนย์กลางโมเดลของ Hugging face และสามารถโหลดได้โดยใช้ RemoteResources ที่กำหนดไว้ในไลบรารีนี้

สคริปต์ยูทิลิตี้การแปลงรวมอยู่ใน ./utils เพื่อแปลงน้ำหนัก Pytorch เป็นชุดน้ำหนักที่เข้ากันได้กับไลบรารีนี้ สคริปต์นี้ต้องใช้ Python และ torch ในการตั้งค่า

ขยาย

ข้อมูลเพิ่มเติม

เวอร์ชัน 1.0.0
ประเภท โค้ดแหล่งที่มา AI
เวลาอัปเดต 2024-12-10
ขนาด 685.2KB
มาจาก Github

แอปที่เกี่ยวข้อง

GitHub sgrebnov/cordova plugin background download

2024-11-05
shadowsocks rust

2024-11-03
สนิมกัดกร่อน Android เวอร์ชันภาษาจีน

2023-06-05
สนิมการกัดกร่อน

2023-06-05
เครื่องจำลองการอยู่รอดของการกัดกร่อนของสนิม

2023-03-24
สนิม

2022-08-28

แนะนำสำหรับคุณ

chat.petals.dev

ซอร์สโค้ดอื่น ๆ

1.0.0
GPT Prompt Templates

ซอร์สโค้ดอื่น ๆ

1.0.0
GPTyped

ซอร์สโค้ดอื่น ๆ

GPTyped 1.0.5
node telegram bot api

โค้ดแหล่งที่มา AI

v0.50.0
typebot.io

โค้ดแหล่งที่มา AI

v3.1.2
python wechaty getting started

โค้ดแหล่งที่มา AI

1.0.0
waymo open dataset

ซอร์สโค้ดอื่น ๆ

December 2023 Update
termwind

หมวดหมู่อื่นๆ

v2.3.0
wp functions

หมวดหมู่อื่นๆ

1.0.0

ข้อมูลที่เกี่ยวข้อง ทั้งหมด