mlm pytorch
0.1.0
此儲存庫可讓您根據序列資料語料庫快速為變壓器設定無監督訓練。
$ pip install mlm-pytorch
首先pip install x-transformers
,然後執行以下範例來查看無監督訓練的一次迭代是什麼樣的
import torch
from torch import nn
from torch . optim import Adam
from mlm_pytorch import MLM
# instantiate the language model
from x_transformers import TransformerWrapper , Encoder
transformer = TransformerWrapper (
num_tokens = 20000 ,
max_seq_len = 1024 ,
attn_layers = Encoder (
dim = 512 ,
depth = 6 ,
heads = 8
)
)
# plugin the language model into the MLM trainer
trainer = MLM (
transformer ,
mask_token_id = 2 , # the token id reserved for masking
pad_token_id = 0 , # the token id for padding
mask_prob = 0.15 , # masking probability for masked language modeling
replace_prob = 0.90 , # ~10% probability that token will not be masked, but included in loss, as detailed in the epaper
mask_ignore_token_ids = [] # other tokens to exclude from masking, include the [cls] and [sep] here
). cuda ()
# optimizer
opt = Adam ( trainer . parameters (), lr = 3e-4 )
# one training step (do this for many steps in a for loop, getting new `data` each time)
data = torch . randint ( 0 , 20000 , ( 8 , 1024 )). cuda ()
loss = trainer ( data )
loss . backward ()
opt . step ()
opt . zero_grad ()
# after much training, the model should have improved for downstream tasks
torch . save ( transformer , f'./pretrained-model.pt' )
執行上述許多步驟,您的模型應該會得到改進。
@misc { devlin2018bert ,
title = { BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding } ,
author = { Jacob Devlin and Ming-Wei Chang and Kenton Lee and Kristina Toutanova } ,
year = { 2018 } ,
eprint = { 1810.04805 } ,
archivePrefix = { arXiv } ,
primaryClass = { cs.CL }
}