mlm pytorch
0.1.0
Este repositorio le permite configurar rápidamente un entrenamiento no supervisado para su transformador a partir de un corpus de datos de secuencia.
$ pip install mlm-pytorch
Primero pip install x-transformers
, luego ejecute el siguiente ejemplo para ver cómo es una iteración del entrenamiento no supervisado
import torch
from torch import nn
from torch . optim import Adam
from mlm_pytorch import MLM
# instantiate the language model
from x_transformers import TransformerWrapper , Encoder
transformer = TransformerWrapper (
num_tokens = 20000 ,
max_seq_len = 1024 ,
attn_layers = Encoder (
dim = 512 ,
depth = 6 ,
heads = 8
)
)
# plugin the language model into the MLM trainer
trainer = MLM (
transformer ,
mask_token_id = 2 , # the token id reserved for masking
pad_token_id = 0 , # the token id for padding
mask_prob = 0.15 , # masking probability for masked language modeling
replace_prob = 0.90 , # ~10% probability that token will not be masked, but included in loss, as detailed in the epaper
mask_ignore_token_ids = [] # other tokens to exclude from masking, include the [cls] and [sep] here
). cuda ()
# optimizer
opt = Adam ( trainer . parameters (), lr = 3e-4 )
# one training step (do this for many steps in a for loop, getting new `data` each time)
data = torch . randint ( 0 , 20000 , ( 8 , 1024 )). cuda ()
loss = trainer ( data )
loss . backward ()
opt . step ()
opt . zero_grad ()
# after much training, the model should have improved for downstream tasks
torch . save ( transformer , f'./pretrained-model.pt' )
Haga lo anterior en muchos pasos y su modelo debería mejorar.
@misc { devlin2018bert ,
title = { BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding } ,
author = { Jacob Devlin and Ming-Wei Chang and Kenton Lee and Kristina Toutanova } ,
year = { 2018 } ,
eprint = { 1810.04805 } ,
archivePrefix = { arXiv } ,
primaryClass = { cs.CL }
}