mlm pytorch
Ce référentiel vous permet de configurer rapidement une formation non supervisée pour votre transformateur à partir d'un corpus de données de séquence.
$ pip install mlm-pytorch
Commencez par pip install x-transformers
, puis exécutez l'exemple suivant pour voir à quoi ressemble une itération de la formation non supervisée
import torch
from torch import nn
from torch . optim import Adam
from mlm_pytorch import MLM
# instantiate the language model
from x_transformers import TransformerWrapper , Encoder
transformer = TransformerWrapper (
num_tokens = 20000 ,
max_seq_len = 1024 ,
attn_layers = Encoder (
dim = 512 ,
depth = 6 ,
heads = 8
# plugin the language model into the MLM trainer
trainer = MLM (
transformer ,
mask_token_id = 2 , # the token id reserved for masking
pad_token_id = 0 , # the token id for padding
mask_prob = 0.15 , # masking probability for masked language modeling
replace_prob = 0.90 , # ~10% probability that token will not be masked, but included in loss, as detailed in the epaper
mask_ignore_token_ids = [] # other tokens to exclude from masking, include the [cls] and [sep] here
). cuda ()
# optimizer
opt = Adam ( trainer . parameters (), lr = 3e-4 )
# one training step (do this for many steps in a for loop, getting new `data` each time)
data = torch . randint ( 0 , 20000 , ( 8 , 1024 )). cuda ()
loss = trainer ( data )
loss . backward ()
opt . step ()
opt . zero_grad ()
# after much training, the model should have improved for downstream tasks
torch . save ( transformer , f'./' )
Faites ce qui précède pour de nombreuses étapes et votre modèle devrait s'améliorer.
@misc { devlin2018bert ,
title = { BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding } ,
author = { Jacob Devlin and Ming-Wei Chang and Kenton Lee and Kristina Toutanova } ,
year = { 2018 } ,
eprint = { 1810.04805 } ,
archivePrefix = { arXiv } ,
primaryClass = { cs.CL }