ดาวน์โหลด DeBERTa - ดาวน์โหลดซอร์สโค้ด DeBERTa

DeBERTa

โค้ดแหล่งที่มา AI

1.0.0

ดาวน์โหลด

DeBERTa : BERT ที่ปรับปรุงการถอดรหัสพร้อมความสนใจที่ไม่พันกัน

พื้นที่เก็บข้อมูลนี้เป็นการใช้งานอย่างเป็นทางการของ DeBERTa : D การเข้ารหัส BERT ขั้น สูง ด้วย Disentangled A ttention และ DeBERTa V3: การปรับปรุง DeBERTa โดยใช้การฝึกอบรมล่วงหน้าสไตล์ ELECTRA พร้อมการแบ่งปันการฝังแบบไล่ระดับที่แยกออก

ข่าว

18/03/2023

กระดาษ DeBERTa V3 ได้รับการยอมรับโดย ICLR 2023
มีการเพิ่มรหัสสำหรับการฝึกอบรมล่วงหน้าและการฝึกอบรมอย่างต่อเนื่องของ DeBERTa V3 โปรดตรวจสอบรุ่นภาษาเพื่อดูรายละเอียด

12/8/2021

เพิ่ม DeBERTa -V3-XSmall แล้ว ด้วยพารามิเตอร์แกนหลักเพียง 22M ซึ่งเป็นเพียง 1/4 ของ RoBERTa-Base และ XLNet-Base ทำให้ DeBERTa -V3-XSmall มีประสิทธิภาพเหนือกว่างาน MNLI และ SQuAD v2.0 ในภายหลังอย่างมาก (เช่น 1.2% บน MNLI-m, คะแนน EM 1.5% บน SQuAD เวอร์ชัน 2.0) สิ่งนี้แสดงให้เห็นถึงประสิทธิภาพของรุ่น DeBERTa V3 เพิ่มเติม

16/11/2021

แบบจำลองของงานใหม่ของเรา DeBERTa V3: การปรับปรุง DeBERTa โดยใช้การฝึกอบรมล่วงหน้าสไตล์ ELECTRA พร้อมการแบ่งปันการฝังแบบไล่ระดับที่แยกออกจากกัน เผยแพร่ต่อสาธารณะแล้วที่ฮับโมเดล Huggingface ในขณะนี้ รุ่นใหม่ใช้โมเดล DeBERTa -V2 โดยการแทนที่ MLM ด้วยวัตถุประสงค์สไตล์ ELECTRA บวกกับการแบ่งปันการฝังแบบไล่ระดับที่แยกส่วน ซึ่งช่วยปรับปรุงประสิทธิภาพของโมเดลให้ดียิ่งขึ้น
มีการเพิ่มสคริปต์สำหรับการปรับแต่งโมเดล DeBERTa V3 อย่างละเอียด
เพิ่มรหัสของหัวงาน RTD แล้ว
มีการเพิ่มเอกสารสำหรับการฝึกอบรมล่วงหน้าแบบจำลองภาษา

31/3/2021

เพิ่มงานโมเดลภาษามาสก์แล้ว
เพิ่มงาน SuperGLUE แล้ว
เพิ่มรหัส SiFT แล้ว

2/03/2021

รหัส DeBERTa v2 และรุ่น 900M, 1.5B อยู่ที่นี่แล้ว ซึ่งรวมถึงรุ่น 1.5B ที่ใช้สำหรับการส่ง SuperGLUE รุ่นเดียวของเราและได้คะแนน 89.9 เทียบกับค่าพื้นฐานของมนุษย์ที่ 89.8 คุณสามารถดูรายละเอียดเพิ่มเติมเกี่ยวกับการส่งนี้ได้ในบล็อกของเรา

มีอะไรใหม่ในเวอร์ชัน 2

คำศัพท์ ในเวอร์ชัน 2 เราใช้คำศัพท์ใหม่ขนาด 128K ที่สร้างขึ้นจากข้อมูลการฝึกอบรม แทนที่จะใช้โทเค็นไนเซอร์ GPT2 เราใช้โทเค็นไนเซอร์ประโยค
nGiE(nGram Induced Input Encoding) ในเวอร์ชัน 2 เราใช้เลเยอร์ Convolution เพิ่มเติมนอกเหนือจากเลเยอร์ Transformer แรกเพื่อเรียนรู้การพึ่งพาโทเค็นอินพุตในพื้นที่ได้ดียิ่งขึ้น เราจะเพิ่มการศึกษาการระเหยเพิ่มเติมเกี่ยวกับคุณลักษณะนี้
การแชร์เมทริกซ์การฉายตำแหน่งตำแหน่งกับเมทริกซ์การฉายเนื้อหาในเลเยอร์ความสนใจ จากการทดลองครั้งก่อนของเรา เราพบว่าสิ่งนี้สามารถบันทึกพารามิเตอร์ได้โดยไม่ส่งผลกระทบต่อประสิทธิภาพ
ใช้ที่เก็บข้อมูลเพื่อเข้ารหัสตำแหน่งสัมพัทธ์ ในเวอร์ชัน 2 เราใช้ที่เก็บข้อมูลบันทึกเพื่อเข้ารหัสตำแหน่งสัมพันธ์ที่คล้ายกับ T5
รุ่น 900M และรุ่น 1.5B ในเวอร์ชัน 2 เราปรับขนาดโมเดลของเราเป็น 900M และ 1.5B ซึ่งปรับปรุงประสิทธิภาพของงานดาวน์สตรีมอย่างมีนัยสำคัญ

29/12/2020

ด้วยรุ่น DeBERTa 1.5B เราเหนือกว่ารุ่น T5 11B และประสิทธิภาพของมนุษย์บนกระดานผู้นำ SuperGLUE รหัสและรุ่นจะออกเร็วๆ นี้ โปรดตรวจสอบเอกสารของเราสำหรับรายละเอียดเพิ่มเติม

06/13/2020

เราเผยแพร่โมเดลที่ได้รับการฝึกอบรมล่วงหน้า ซอร์สโค้ด และสคริปต์การปรับแต่งเพื่อสร้างผลลัพธ์การทดลองบางส่วนในรายงานฉบับนี้ คุณสามารถทำตามสคริปต์ที่คล้ายกันเพื่อนำ DeBERTa ไปใช้กับการทดลองหรือแอปพลิเคชันของคุณเองได้ สคริปต์ก่อนการฝึกอบรมจะเผยแพร่ในขั้นตอนถัดไป

ข้อมูลเบื้องต้นเกี่ยวกับ DeBERTa

DeBERTa (BERT ที่ปรับปรุงการถอดรหัสพร้อมความสนใจที่ไม่พันกัน) ปรับปรุงโมเดล BERT และ RoBERTa โดยใช้เทคนิคใหม่สองประการ ประการแรกคือกลไกความสนใจที่ไม่พันกัน โดยแต่ละคำจะถูกแทนด้วยเวกเตอร์สองตัวที่เข้ารหัสเนื้อหาและตำแหน่ง ตามลำดับ และน้ำหนักความสนใจของคำต่างๆ จะถูกคำนวณโดยใช้เมทริกซ์ที่ไม่พันกันกับเนื้อหาและตำแหน่งสัมพัทธ์ ประการที่สอง ใช้ตัวถอดรหัสมาสก์ที่ได้รับการปรับปรุงเพื่อแทนที่เลเยอร์ softmax เอาท์พุตเพื่อทำนายโทเค็นที่มาสก์สำหรับการฝึกโมเดลล่วงหน้า เราแสดงให้เห็นว่าเทคนิคทั้งสองนี้ปรับปรุงประสิทธิภาพของการฝึกอบรมแบบจำลองล่วงหน้าและประสิทธิภาพของงานขั้นปลายได้อย่างมีนัยสำคัญ

โมเดลที่ผ่านการฝึกอบรมล่วงหน้า

โมเดลที่ผ่านการฝึกอบรมล่วงหน้าของเราจะบรรจุเป็นไฟล์ซิป คุณสามารถดาวน์โหลดได้จากรุ่นต่างๆ ของเรา หรือดาวน์โหลดแต่ละรุ่นผ่านลิงก์ด้านล่าง:

แบบอย่าง	คำศัพท์(K)	พารามิเตอร์แกนหลัก (M)	ขนาดที่ซ่อนอยู่	เลเยอร์	บันทึก
V2-XXLarge ¹	128	1320	1536	48	คำศัพท์ SPM ใหม่ 128,000 คำ
V2-XLarge	128	710	1536	24	คำศัพท์ SPM ใหม่ 128,000 คำ
XLarge	50	700	1,024	48	คำศัพท์เดียวกับ RoBERTa
ใหญ่	50	350	1,024	24	คำศัพท์เดียวกับ RoBERTa
ฐาน	50	100	768	12	คำศัพท์เดียวกับ RoBERTa
V2-XXarge-MNLI	128	1320	1536	48	พลิกโฉมด้วย MNLI
V2-XLarge-MNLI	128	710	1536	24	พลิกโฉมด้วย MNLI
XLarge-MNLI	50	700	1,024	48	พลิกโฉมด้วย MNLI
ขนาดใหญ่-MNLI	50	350	1,024	24	พลิกโฉมด้วย MNLI
ฐาน-MNLI	50	86	768	12	พลิกโฉมด้วย MNLI
DeBERTa -V3-ใหญ่ ²	128	304	1,024	24	คำศัพท์ SPM ใหม่ 128,000 คำ
DeBERTa -V3-เบส ²	128	86	768	12	คำศัพท์ SPM ใหม่ 128,000 คำ
DeBERTa -V3-เล็ก ²	128	44	768	6	คำศัพท์ SPM ใหม่ 128,000 คำ
DeBERTa -V3-XSmall ²	128	22	384	12	คำศัพท์ SPM ใหม่ 128,000 คำ
ม. DeBERTa -V3-ฐาน ²	250	86	768	12	คำศัพท์ SPM ใหม่ 250,000 รายการ โมเดลหลายภาษาพร้อม 102 ภาษา

บันทึก

1 นี่คือรุ่น (89.9) ที่เหนือกว่า T5 11B (89.3) และประสิทธิภาพของมนุษย์ (89.8) บน SuperGLUE เป็นครั้งแรก คำศัพท์ SPM ใหม่ 128,000 คำ
2 โมเดล V3 DeBERTa เหล่านี้เป็นโมเดล DeBERTa ที่ได้รับการฝึกอบรมล่วงหน้าโดยมีวัตถุประสงค์สไตล์ ELECTRA บวกกับการแบ่งปันการฝังแบบไล่ระดับที่แยกส่วน ซึ่งช่วยปรับปรุงประสิทธิภาพของโมเดลได้อย่างมาก

ลองรุ่น

อ่านเอกสารของเรา

ความต้องการ

ระบบ Linux เช่น Ubuntu 18.04LTS
CUDA 10.0
ไพทอร์ช 1.3.0
หลาม 3.6
ทุบตีเชลล์ 4.0
ขด
นักเทียบท่า (ไม่จำเป็น)
nvidia-นักเทียบท่า2 (ไม่จำเป็น)

มีหลายวิธีในการลองใช้โค้ดของเรา

ใช้นักเทียบท่า

Docker เป็นวิธีที่แนะนำในการรันโค้ดเนื่องจากเราได้สร้างการพึ่งพาทุกครั้งลงใน docker bagai/ DeBERTa ของเราแล้ว และคุณสามารถติดตามเว็บไซต์อย่างเป็นทางการของ docker เพื่อติดตั้ง docker บนเครื่องของคุณ

หากต้องการรันด้วยนักเทียบท่า ตรวจสอบให้แน่ใจว่าระบบของคุณเป็นไปตามข้อกำหนดในรายการด้านบน ต่อไปนี้เป็นขั้นตอนในการลองใช้การทดลอง GLUE: ดึงโค้ด รัน ./run_docker.sh จากนั้นคุณสามารถรันคำสั่ง bash ภายใต้ / DeBERTa /experiments/glue/

ใช้ปิ๊ป

ดึงโค้ดและรัน pip3 install -r requirements.txt ในไดเร็กทอรีรากของโค้ด จากนั้นป้อนโฟลเดอร์ experiments/glue/ ของโค้ดแล้วลองใช้คำสั่ง bash ใต้โฟลเดอร์นั้นเพื่อทำการทดลองกาว

ติดตั้งเป็นแพ็คเกจ pip

pip install DeBERTa

ใช้ DeBERTa ในโค้ดที่มีอยู่

DeBERTa to your existing code, you need to make two changes to your code, # 1. change your model to consume DeBERTa as the encoder from DeBERTa import DeBERTa import torch class MyModel(torch.nn.Module): def __init__(self): super().__init__() # Your existing model code self. DeBERTa = DeBERTa . DeBERTa (pre_trained='base') # Or 'large' 'base-mnli' 'large-mnli' 'xlarge' 'xlarge-mnli' 'xlarge-v2' 'xxlarge-v2' # Your existing model code # do inilization as before # self. DeBERTa .apply_state() # Apply the pre-trained model of DeBERTa at the end of the constructor # def forward(self, input_ids): # The inputs to DeBERTa forward are # `input_ids`: a torch.LongTensor of shape [batch_size, sequence_length] with the word token indices in the vocabulary # `token_type_ids`: an optional torch.LongTensor of shape [batch_size, sequence_length] with the token types indices selected in [0, 1]. # Type 0 corresponds to a `sentence A` and type 1 corresponds to a `sentence B` token (see BERT paper for more details). # `attention_mask`: an optional parameter for input mask or attention mask. # - If it's an input mask, then it will be torch.LongTensor of shape [batch_size, sequence_length] with indices selected in [0, 1]. # It's a mask to be used if the input sequence length is smaller than the max input sequence length in the current batch. # It's the mask that we typically use for attention when a batch has varying length sentences. # - If it's an attention mask then if will be torch.LongTensor of shape [batch_size, sequence_length, sequence_length]. # In this case, it's a mask indicating which tokens in the sequence should be attended by other tokens in the sequence. # `output_all_encoded_layers`: whether to output results of all encoder layers, default, True encoding = DeBERTa .bert(input_ids)[-1] # 2. Change your tokenizer with the tokenizer built-in DeBERTa from DeBERTa import DeBERTa vocab_path, vocab_type = DeBERTa .load_vocab(pretrained_id='base') tokenizer = DeBERTa .tokenizers[vocab_type](vocab_path) # We apply the same schema of special tokens as BERT, e.g. [CLS], [SEP], [MASK] max_seq_len = 512 tokens = tokenizer.tokenize('Examples input text of DeBERTa ') # Truncate long sequence tokens = tokens[:max_seq_len -2] # Add special tokens to the `tokens` tokens = ['[CLS]'] + tokens + ['[SEP]'] input_ids = tokenizer.convert_tokens_to_ids(tokens) input_mask = [1]*len(input_ids) # padding paddings = max_seq_len-len(input_ids) input_ids = input_ids + [0]*paddings input_mask = input_mask + [0]*paddings features = { 'input_ids': torch.tensor(input_ids, dtype=torch.int), 'input_mask': torch.tensor(input_mask, dtype=torch.int) } ">

 # To apply DeBERTa to your existing code, you need to make two changes to your code,
# 1. change your model to consume DeBERTa as the encoder
from DeBERTa import DeBERTa
import torch
class MyModel ( torch . nn . Module ):
  def __init__ ( self ):
    super (). __init__ ()
    # Your existing model code
    self . DeBERTa = DeBERTa . DeBERTa ( pre_trained = 'base' ) # Or 'large' 'base-mnli' 'large-mnli' 'xlarge' 'xlarge-mnli' 'xlarge-v2' 'xxlarge-v2'
    # Your existing model code
    # do inilization as before
    # 
    self . DeBERTa . apply_state () # Apply the pre-trained model of DeBERTa at the end of the constructor
    #
  def forward ( self , input_ids ):
    # The inputs to DeBERTa forward are
    # `input_ids`: a torch.LongTensor of shape [batch_size, sequence_length] with the word token indices in the vocabulary
    # `token_type_ids`: an optional torch.LongTensor of shape [batch_size, sequence_length] with the token types indices selected in [0, 1]. 
    #    Type 0 corresponds to a `sentence A` and type 1 corresponds to a `sentence B` token (see BERT paper for more details).
    # `attention_mask`: an optional parameter for input mask or attention mask. 
    #   - If it's an input mask, then it will be torch.LongTensor of shape [batch_size, sequence_length] with indices selected in [0, 1]. 
    #      It's a mask to be used if the input sequence length is smaller than the max input sequence length in the current batch. 
    #      It's the mask that we typically use for attention when a batch has varying length sentences.
    #   - If it's an attention mask then if will be torch.LongTensor of shape [batch_size, sequence_length, sequence_length]. 
    #      In this case, it's a mask indicating which tokens in the sequence should be attended by other tokens in the sequence. 
    # `output_all_encoded_layers`: whether to output results of all encoder layers, default, True
    encoding = DeBERTa . bert ( input_ids )[ - 1 ]

# 2. Change your tokenizer with the tokenizer built-in DeBERTa
from DeBERTa import DeBERTa
vocab_path , vocab_type = DeBERTa . load_vocab ( pretrained_id = 'base' )
tokenizer = DeBERTa . tokenizers [ vocab_type ]( vocab_path )
# We apply the same schema of special tokens as BERT, e.g. [CLS], [SEP], [MASK]
max_seq_len = 512
tokens = tokenizer . tokenize ( 'Examples input text of DeBERTa ' )
# Truncate long sequence
tokens = tokens [: max_seq_len - 2 ]
# Add special tokens to the `tokens`
tokens = [ '[CLS]' ] + tokens + [ '[SEP]' ]
input_ids = tokenizer . convert_tokens_to_ids ( tokens )
input_mask = [ 1 ] * len ( input_ids )
# padding
paddings = max_seq_len - len ( input_ids )
input_ids = input_ids + [ 0 ] * paddings
input_mask = input_mask + [ 0 ] * paddings
features = {
'input_ids' : torch . tensor ( input_ids , dtype = torch . int ),
'input_mask' : torch . tensor ( input_mask , dtype = torch . int )
}

เรียกใช้การทดลอง DeBERTa จากบรรทัดคำสั่ง

สำหรับงานติดกาว

รับข้อมูล

DeBERTa/ cd experiments/glue ./download_data.sh $cache_dir/glue_tasks">

cache_dir=/tmp/ DeBERTa /
cd experiments/glue
./download_data.sh  $cache_dir /glue_tasks

เรียกใช้งาน

DeBERTa/exps/$task export OMP_NUM_THREADS=1 python3 -m DeBERTa .apps.run --task_name $task --do_train --data_dir $cache_dir/glue_tasks/$task --eval_batch_size 128 --predict_batch_size 128 --output_dir $OUTPUT --scale_steps 250 --loss_scale 16384 --accumulative_update 1 --num_train_epochs 6 --warmup 100 --learning_rate 2e-5 --train_batch_size 32 --max_seq_len 128">

task=STS-B 
OUTPUT=/tmp/ DeBERTa /exps/ $task
export OMP_NUM_THREADS=1
python3 -m DeBERTa .apps.run --task_name $task --do_train  
  --data_dir $cache_dir /glue_tasks/ $task 
  --eval_batch_size 128 
  --predict_batch_size 128 
  --output_dir $OUTPUT 
  --scale_steps 250 
  --loss_scale 16384 
  --accumulative_update 1   
  --num_train_epochs 6 
  --warmup 100 
  --learning_rate 2e-5 
  --train_batch_size 32 
  --max_seq_len 128

หมายเหตุ

1. ตามค่าเริ่มต้น เราจะแคชโมเดลที่ได้รับการฝึกล่วงหน้าและโทเค็นไนเซอร์ไว้ที่ $HOME/.~ DeBERTa คุณอาจต้องล้างข้อมูลหากการดาวน์โหลดล้มเหลวโดยไม่คาดคิด
1. คุณยังสามารถลองใช้โมเดลของเรากับ HF Transformers ได้อีกด้วย แต่เมื่อคุณลองใช้โมเดล XXLarge คุณจะต้องระบุอาร์กิวเมนต์ --sharded_ddp โปรดตรวจสอบการ์ดรุ่น XXLarge ของเราเพื่อดูรายละเอียดเพิ่มเติม

การทดลอง

การทดลองปรับแต่งอย่างละเอียดของเราดำเนินการบนโหนด DGX-2 ครึ่งหนึ่งที่มีการ์ด GPU 8x32 V100 ผลลัพธ์อาจแตกต่างกันไปตามรุ่น GPU ไดรเวอร์ เวอร์ชัน CUDA SDK ที่แตกต่างกัน โดยใช้ FP16 หรือ FP32 และเมล็ดแบบสุ่ม เรารายงานตัวเลขของเราจากการวิ่งหลายครั้งโดยมีเมล็ดพันธุ์สุ่มที่แตกต่างกันที่นี่ นี่คือผลลัพธ์จากโมเดลขนาดใหญ่:

งาน	สั่งการ	ผลลัพธ์	เวลาทำงาน (8x32G V100 GPU)
MNLI xxlarge เวอร์ชัน 2	`experiments/glue/mnli.sh xxlarge-v2`	91.7/91.9 +/-0.1	4ชม
MNLI xlarge เวอร์ชัน 2.1	`experiments/glue/mnli.sh xlarge-v2`	91.7/91.6 +/-0.1	2.5 ชม
MNLI xlarge	`experiments/glue/mnli.sh xlarge`	91.5/91.2 +/-0.1	2.5 ชม
MNLI ขนาดใหญ่	`experiments/glue/mnli.sh large`	91.3/91.1 +/-0.1	2.5 ชม
QQP ใหญ่	`experiments/glue/qqp.sh large`	92.3 +/-0.1	6ชม
QNLI ขนาดใหญ่	`experiments/glue/qnli.sh large`	95.3 +/-0.2	2ชม
MRPC ขนาดใหญ่	`experiments/glue/mrpc.sh large`	91.9 +/-0.5	0.5ชม
RTE ใหญ่	`experiments/glue/rte.sh large`	86.6 +/-1.0	0.5ชม
SST-2 ใหญ่	`experiments/glue/sst2.sh large`	96.7 +/-0.3	1ชม
STS-b ใหญ่	`experiments/glue/Stsb.sh large`	92.5 +/-0.3	0.5 ชม
โคล่าใหญ่	`experiments/glue/cola.sh`	70.5 +/-1.0	0.5 ชม

และนี่คือผลลัพธ์จากโมเดลพื้นฐาน

งาน	สั่งการ	ผลลัพธ์	เวลาทำงาน (8x32G V100 GPU)
ฐาน MNLI	`experiments/glue/mnli.sh base`	88.8/88.5 +/-0.2	1.5 ชม

การปรับแต่งงาน NLU อย่างละเอียด

เรานำเสนอผลลัพธ์การพัฒนาบน SQuAD 1.1/2.0 และงานการวัดประสิทธิภาพ GLUE หลายงาน

แบบอย่าง	หน่วย 1.1	ทีม 2.0	MNLI-ม./มม	SST-2	คิวเอ็นแอลไอ	โคล่า	ร.ต.ท	มพร	คิวคิวพี	เอสทีเอส-บี
	F1/อีเอ็ม	F1/อีเอ็ม	บัญชี	บัญชี	บัญชี	มช	บัญชี	บัญชี/F1	บัญชี/F1	พี/เอส
BERT-ใหญ่	90.9/84.1	81.8/79.0	86.6/-	93.2	92.3	60.6	70.4	88.0/-	91.3/-	90.0/-
RoBERTa-Large	94.6/88.9	89.4/86.5	90.2/-	96.4	93.9	68.0	86.6	90.9/-	92.2/-	92.4/-
XLNet-ขนาดใหญ่	95.1/89.7	90.6/87.9	90.8/-	97.0	94.9	69.0	85.9	90.8/-	92.3/-	92.5/-
DeBERTa -ใหญ่ ¹	95.5/90.1	90.7/88.0	91.3/91.1	96.5	95.3	69.5	91.0	92.6/94.6	92.3/-	92.8/92.5
DeBERTa -เอ็กซ์ลาร์จ ¹	-	-	91.5/91.2	97.0	-	-	93.1	92.1/94.3	-	92.9/92.7
DeBERTa -V2-XLarge ¹	95.8/90.8	91.4/88.9	91.7/91.6	97.5	95.8	71.1	93.9	92.0/94.2	92.3/89.8	92.9/92.9
DeBERTa -V2-XXLarge ^1,2	96.1/91.4	92.2/89.7	91.7/91.9	97.2	96.0	72.0	93.5	93.1/94.9	92.7/90.3	93.2/93.1
DeBERTa -V3-Large	-	91.5/89.0	91.8/91.9	96.9	96.0	75.3	92.7	92.2/-	93.0/-	93.0/-
DeBERTa -V3-ฐาน	-	88.4/85.4	90.6/90.7	-	-	-	-	-	-	-
DeBERTa -V3-Small	-	82.9/80.4	88.3/87.7	-	-	-	-	-	-	-
DeBERTa -V3-XSmall	-	84.8/82.0	88.1/88.3	-	-	-	-	-	-	-

การปรับแต่ง XNLI อย่างละเอียด

เรานำเสนอผลการพัฒนาบน XNLI ด้วยการตั้งค่าการถ่ายโอนข้ามภาษาแบบ Zero-shot เช่น การฝึกอบรมโดยใช้ข้อมูลภาษาอังกฤษเท่านั้น การทดสอบในภาษาอื่น

แบบอย่าง	เฉลี่ย	ห้องน้ำในตัว	ศ	เช่น	เดอ	เอล	บีจี	รุ	ตร	อาร์	วิ	ไทย	จ	สวัสดี	สว	คุณ
XLM-R-ฐาน	76.2	85.8	79.7	80.7	78.7	77.5	79.6	78.1	74.2	73.8	76.5	74.6	76.7	72.4	66.5	68.3
ม. DeBERTa -V3-ฐาน	79.8 +/-0.2	88.2	82.6	84.4	82.7	82.3	82.4	80.8	79.5	78.5	78.1	76.4	79.5	75.9	73.9	72.4

หมายเหตุ

¹ หลังจาก RoBERTa สำหรับ RTE, MRPC, STS-B เราได้ปรับแต่งงานตาม DeBERTa -Large-MNLI, DeBERTa -XLarge-MNLI, DeBERTa -V2-XLarge-MNLI, DeBERTa -V2-XXLarge-MNLI ผลลัพธ์ของ SST-2/QQP/QNLI/SQuADv2 จะได้รับการปรับปรุงเล็กน้อยเมื่อเริ่มจากโมเดลที่ปรับแต่งอย่างละเอียดของ MNLI อย่างไรก็ตาม เราจะรายงานเฉพาะตัวเลขที่ปรับแต่งอย่างละเอียดจากโมเดลพื้นฐานที่ได้รับการฝึกล่วงหน้าสำหรับงานทั้ง 4 เหล่านั้นเท่านั้น

การฝึกอบรมล่วงหน้าโดยมีวัตถุประสงค์ MLM และ RTD

หากต้องการฝึกอบรม DeBERTa ล่วงหน้าด้วยวัตถุประสงค์ MLM และ RTD โปรดตรวจสอบ experiments/language_models

รายชื่อผู้ติดต่อ

Pengcheng He([email protected]), Xiaodong Liu([email protected]), Jianfeng Gao([email protected]), Weizhu Chen([email protected])

การอ้างอิง

DeBERTav3, title={ DeBERTa V3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing}, author={Pengcheng He and Jianfeng Gao and Weizhu Chen}, year={2021}, eprint={2111.09543}, archivePrefix={arXiv}, primaryClass={cs.CL} }">

@misc{he2021 DeBERTa v3,
      title={ DeBERTa V3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing}, 
      author={Pengcheng He and Jianfeng Gao and Weizhu Chen},
      year={2021},
      eprint={2111.09543},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

DeBERTa, title={ DeBERTa : DECODING-ENHANCED BERT WITH DISENTANGLED ATTENTION}, author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen}, booktitle={International Conference on Learning Representations}, year={2021}, url={https://openreview.net/forum?id=XPZIaotutsD} }">

@inproceedings{
he2021 DeBERTa ,
title={ DeBERTa : DECODING-ENHANCED BERT WITH DISENTANGLED ATTENTION},
author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=XPZIaotutsD}
}

ขยาย

ข้อมูลเพิ่มเติม

เวอร์ชัน 1.0.0
ประเภท โค้ดแหล่งที่มา AI
เวลาอัปเดต 2024-12-31
ขนาด 50MB
มาจาก Github

แอปที่เกี่ยวข้อง

node telegram bot api

2024-12-14
typebot.io

2024-12-14
python wechaty getting started

2024-12-14
TranscriberBot

2024-12-14
genal chat

2024-12-14
Facemoji

2024-12-14

แนะนำสำหรับคุณ

chat.petals.dev

ซอร์สโค้ดอื่น ๆ

1.0.0
GPT Prompt Templates

ซอร์สโค้ดอื่น ๆ

1.0.0
GPTyped

ซอร์สโค้ดอื่น ๆ

GPTyped 1.0.5
node telegram bot api

โค้ดแหล่งที่มา AI

v0.50.0
typebot.io

โค้ดแหล่งที่มา AI

v3.1.2
python wechaty getting started

โค้ดแหล่งที่มา AI

1.0.0
waymo open dataset

ซอร์สโค้ดอื่น ๆ

December 2023 Update
termwind

หมวดหมู่อื่นๆ

v2.3.0
wp functions

หมวดหมู่อื่นๆ

1.0.0

ข้อมูลที่เกี่ยวข้อง ทั้งหมด