belgpt2
1.0.0
在非常大且异构的法语语料库 (~60Gb) 上预训练的 GPT-2 模型。
您可以将 BelGPT-2 与 ? Transformer库如下:
import torch
from transformers import GPT2Tokenizer , GPT2LMHeadModel
# Load pretrained model and tokenizer
model = GPT2LMHeadModel . from_pretrained ( "antoiloui/ belgpt2 " )
tokenizer = GPT2Tokenizer . from_pretrained ( "antoiloui/ belgpt2 " )
# Generate a sample of text
model . eval ()
output = model . generate (
bos_token_id = random . randint ( 1 , 50000 ),
do_sample = True ,
top_k = 50 ,
max_length = 100 ,
top_p = 0.95 ,
num_return_sequences = 1
)
# Decode it
decoded_output = []
for sample in output :
decoded_output . append ( tokenizer . decode ( sample , skip_special_tokens = True ))
print ( decoded_output )
有关预训练模型、其实现和数据的详细文档可以在此处找到。
对于学术背景下的归属,请将本作品引用为:
@misc{louis2020 belgpt2 ,
author = {Louis, Antoine},
title = {{BelGPT-2: a GPT-2 model pre-trained on French corpora.}},
year = {2020},
howpublished = {url{https://github.com/antoiloui/belgpt2}},
}