gpt2client下載gpt2client源代碼下載

gpt2client

其他源碼

v2.1

下載

gpt2-client（存檔）

GPT-2 124m，345m，774m和1.5b變壓器型號的易於使用的包裝器

它是什麼•安裝•入門

由Rishabh Anand•https://rish-16.github.io製作

這是什麼

GPT-2是OpenAI開發的文本生成的自然語言處理模型。它是GPT（生成預先訓練的變壓器）模型的繼任者，該模型在Internet的40GB文本上訓練。它具有引起關注的變壓器模型是2017年您需要的所有紙張。該模型具有4個版本124M ， 345M ， 774M和1558M - 在供培訓數據的量和所包含的參數數量方面有所不同。

1.5B型號是Openai發布的最大可用型號。

最後，圍繞原始的gpt-2存儲庫的gpt2-client ，具有相同的功能，但具有更多的訪問性，可理解性和UTILTY。您可以在不到五行代碼的情況下使用所有四種GPT-2型號。

*注意：此客戶包裝器絕不承擔直接或間接造成的任何損害。該模型引用的任何名稱，地點和對像都是虛構的，與現實生活實體或組織不相似。樣品未經過濾，可能包含令人反感的內容。建議用戶自由裁量權。*

安裝

通過pip安裝客戶端。理想情況下，對於Python> = 3.5 ， tensorflow> = 1.x ， gpt2-client得到了很好的支持。如果使用Python 2.x，則可能需要使用--upgrade pip來重新安裝或升級一些庫。

pip install gpt2-client

注意： gpt2-client與Tensorflow 2.0不兼容，嘗試Tensorflow 1.14.0

入門

1。下載型號的權重和檢查點

 from gpt2_client import GPT2Client

gpt2 = GPT2Client ( '124M' ) # This could also be `355M`, `774M`, or `1558M`. Rename `save_dir` to anything.
gpt2 . load_model ( force_download = False ) # Use cached versions if available.

這將在當前工作目錄中創建一個名為models目錄，並下載模型所需的權重，檢查點，模型JSON和超參數。一旦調用了load_model()函數，就不必再次調用它，假設文件已在models目錄中完成下載。

注意：設置force_download=True to覆蓋現有的緩存模型權重和檢查點

2。開始生成文本！

 from gpt2_client import GPT2Client

gpt2 = GPT2Client ( '124M' ) # This could also be `355M`, `774M`, or `1558M`
gpt2 . load_model ()

gpt2 . generate ( interactive = True ) # Asks user for prompt
gpt2 . generate ( n_samples = 4 ) # Generates 4 pieces of text
text = gpt2 . generate ( return_text = True ) # Generates text and returns it in an array
gpt2 . generate ( interactive = True , n_samples = 3 ) # A different prompt each time

您可以從上述樣本中看到，生成選項非常靈活。您可以根據需要生成的文本來混合和匹配，無論是多個塊還是一個帶有提示的文本。

3。從一批提示中生成文本

 from gpt2_client import GPT2Client

gpt2 = GPT2Client ( '124M' ) # This could also be `355M`, `774M`, or `1558M`
gpt2 . load_model ()

prompts = [
  "This is a prompt 1" ,
  "This is a prompt 2" ,
  "This is a prompt 3" ,
  "This is a prompt 4"
]

text = gpt2 . generate_batch_from_prompts ( prompts ) # returns an array of generated text

4。定制數據集的微調GPT-2

 from gpt2_client import GPT2Client

gpt2 = GPT2Client ( '124M' ) # This could also be `355M`, `774M`, or `1558M`
gpt2 . load_model ()

my_corpus = './data/shakespeare.txt' # path to corpus
custom_text = gpt2 . finetune ( my_corpus , return_text = True ) # Load your custom dataset

為了將GPT-2微調到您的自定義語料庫或數據集中，可以將GPU或TPU手頭放置。 Google Colab就是您可以使用的一種工具來重新培訓/微調您的自定義模型。

5。編碼和解碼文本序列

 from gpt2_client import GPT2Client

gpt2 = GPT2Client ( '124M' ) # This could also be `355M`, `774M`, or `1558M`
gpt2 . load_model ()

# encoding a sentence
encs = gpt2 . encode_seq ( "Hello world, this is a sentence" )
# [15496, 995, 11, 428, 318, 257, 6827]

# decoding an encoded sequence
decs = gpt2 . decode_seq ( encs )
# Hello world, this is a sentence