gpt2client下载gpt2client源代码下载

gpt2client

其他源码

v2.1

下载

gpt2-client（存档）

GPT-2 124m，345m，774m和1.5b变压器型号的易于使用的包装器

它是什么•安装•入门

由Rishabh Anand•https://rish-16.github.io制作

这是什么

GPT-2是OpenAI开发的文本生成的自然语言处理模型。它是GPT（生成预先训练的变压器）模型的继任者，该模型在Internet的40GB文本上训练。它具有引起关注的变压器模型是2017年您需要的所有纸张。该模型具有4个版本124M ， 345M ， 774M和1558M - 在供培训数据的量和所包含的参数数量方面有所不同。

1.5B型号是Openai发布的最大可用型号。

最后，围绕原始的gpt-2存储库的gpt2-client ，具有相同的功能，但具有更多的访问性，可理解性和UTILTY。您可以在不到五行代码的情况下使用所有四种GPT-2型号。

*注意：此客户包装器绝不承担直接或间接造成的任何损害。该模型引用的任何名称，地点和对象都是虚构的，与现实生活实体或组织不相似。样品未经过滤，可能包含令人反感的内容。建议用户自由裁量权。*

安装

通过pip安装客户端。理想情况下，对于Python> = 3.5 ， tensorflow> = 1.x ， gpt2-client得到了很好的支持。如果使用Python 2.x，则可能需要使用--upgrade pip来重新安装或升级一些库。

pip install gpt2-client

注意： gpt2-client与Tensorflow 2.0不兼容，尝试Tensorflow 1.14.0

入门

1。下载型号的权重和检查点

 from gpt2_client import GPT2Client

gpt2 = GPT2Client ( '124M' ) # This could also be `355M`, `774M`, or `1558M`. Rename `save_dir` to anything.
gpt2 . load_model ( force_download = False ) # Use cached versions if available.

这将在当前工作目录中创建一个名为models目录，并下载模型所需的权重，检查点，模型JSON和超参数。一旦调用了load_model()函数，就不必再次调用它，假设文件已在models目录中完成下载。

注意：设置force_download=True to覆盖现有的缓存模型权重和检查点

2。开始生成文本！

 from gpt2_client import GPT2Client

gpt2 = GPT2Client ( '124M' ) # This could also be `355M`, `774M`, or `1558M`
gpt2 . load_model ()

gpt2 . generate ( interactive = True ) # Asks user for prompt
gpt2 . generate ( n_samples = 4 ) # Generates 4 pieces of text
text = gpt2 . generate ( return_text = True ) # Generates text and returns it in an array
gpt2 . generate ( interactive = True , n_samples = 3 ) # A different prompt each time

您可以从上述样本中看到，生成选项非常灵活。您可以根据需要生成的文本来混合和匹配，无论是多个块还是一个带有提示的文本。

3。从一批提示中生成文本

 from gpt2_client import GPT2Client

gpt2 = GPT2Client ( '124M' ) # This could also be `355M`, `774M`, or `1558M`
gpt2 . load_model ()

prompts = [
  "This is a prompt 1" ,
  "This is a prompt 2" ,
  "This is a prompt 3" ,
  "This is a prompt 4"
]

text = gpt2 . generate_batch_from_prompts ( prompts ) # returns an array of generated text

4。定制数据集的微调GPT-2

 from gpt2_client import GPT2Client

gpt2 = GPT2Client ( '124M' ) # This could also be `355M`, `774M`, or `1558M`
gpt2 . load_model ()

my_corpus = './data/shakespeare.txt' # path to corpus
custom_text = gpt2 . finetune ( my_corpus , return_text = True ) # Load your custom dataset

为了将GPT-2微调到您的自定义语料库或数据集中，可以将GPU或TPU手头放置。 Google Colab就是您可以使用的一种工具来重新培训/微调您的自定义模型。

5。编码和解码文本序列

 from gpt2_client import GPT2Client

gpt2 = GPT2Client ( '124M' ) # This could also be `355M`, `774M`, or `1558M`
gpt2 . load_model ()

# encoding a sentence
encs = gpt2 . encode_seq ( "Hello world, this is a sentence" )
# [15496, 995, 11, 428, 318, 257, 6827]

# decoding an encoded sequence
decs = gpt2 . decode_seq ( encs )
# Hello world, this is a sentence