ctransformers下載ctransformers源代碼下載

ctransformers

使用GGML庫在C/C ++中實現的變壓器模型的Python綁定。

另請參閱Chatdocs

支持的模型
安裝
用法
- ？變壓器
- Langchain
- GPU
- GPTQ
文件
執照

支持的模型

型號	型號類型	庫達	金屬
GPT-2	`gpt2`
gpt-j，gpt4all-j	`gptj`
gpt-neox，穩定	`gpt_neox`
鶻	`falcon`	✅
美洲駝，駱駝2	`llama`	✅	✅
MPT	`mpt`	✅
Starcoder，Starchat	`gpt_bigcode`	✅
多莉V2	`dolly-v2`
補充	`replit`

安裝

pip install ctransformers

用法

它為所有模型提供了一個統一的接口：

 from ctransformers import AutoModelForCausalLM

llm = AutoModelForCausalLM . from_pretrained ( "/path/to/ggml-model.bin" , model_type = "gpt2" )

print ( llm ( "AI is going to" ))

在Google Colab中運行

要流式傳輸輸出，請設置stream=True ：

 for text in llm ( "AI is going to" , stream = True ):
    print ( text , end = "" , flush = True )

您可以直接通過擁抱面板來加載模型：

 llm = AutoModelForCausalLM . from_pretrained ( "marella/gpt-2-ggml" )

如果模型存儲庫有多個模型文件（ .bin或.gguf文件），請使用以下方式指定模型文件

 llm = AutoModelForCausalLM . from_pretrained ( "marella/gpt-2-ggml" , model_file = "ggml-model.bin" )

？變壓器

注意：這是一個實驗特徵，將來可能會改變。

與它一起使用？變形金剛，使用：創建模型和令牌器：

 from ctransformers import AutoModelForCausalLM , AutoTokenizer

model = AutoModelForCausalLM . from_pretrained ( "marella/gpt-2-ggml" , hf = True )
tokenizer = AutoTokenizer . from_pretrained ( model )

在Google Colab中運行

你可以使用嗎？變形金剛文本生成管道：

 from transformers import pipeline

pipe = pipeline ( "text-generation" , model = model , tokenizer = tokenizer )
print ( pipe ( "AI is going to" , max_new_tokens = 256 ))

你可以使用嗎？變形金剛生成參數：

 pipe ( "AI is going to" , max_new_tokens = 256 , do_sample = True , temperature = 0.8 , repetition_penalty = 1.1 )

你可以使用嗎？變形金剛的象徵器：

 from ctransformers import AutoModelForCausalLM
from transformers import AutoTokenizer

model = AutoModelForCausalLM . from_pretrained ( "marella/gpt-2-ggml" , hf = True )  # Load model from GGML model repo.
tokenizer = AutoTokenizer . from_pretrained ( "gpt2" )  # Load tokenizer from original model repo.

Langchain

它被整合到蘭班。請參閱Langchain文檔。

GPU

要在GPU上運行一些模型層，請設置gpu_layers參數：

 llm = AutoModelForCausalLM . from_pretrained ( "TheBloke/Llama-2-7B-GGML" , gpu_layers = 50 )

在Google Colab中運行

庫達

使用以下方式安裝CUDA庫

pip install ctransformers[cuda]

羅克

要啟用ctransformers支持，請使用：

CT_HIPBLAS=1 pip install ctransformers --no-binary ctransformers

金屬

要啟用ctransformers支持，請使用：

CT_METAL=1 pip install ctransformers --no-binary ctransformers

GPTQ

注意：這是一個實驗特徵，只有使用Exllama支持Llama模型。

使用以下方式安裝其他依賴項

pip install ctransformers[gptq]

使用以下方式加載GPTQ模型：

 llm = AutoModelForCausalLM . from_pretrained ( "TheBloke/Llama-2-7B-GPTQ" )

在Google Colab中運行

如果模型名稱或路徑不包含單詞gptq則指定model_type="gptq" 。

它也可以與Langchain一起使用。低級API並未得到完全支持。

文件

config

範圍	類型	描述	預設
`top_k`	`int`	用於採樣的頂級值。	`40`
`top_p`	`float`	用於採樣的最高P值。	`0.95`
`temperature`	`float`	用於採樣的溫度。	`0.8`
`repetition_penalty`	`float`	用於抽樣的重複罰款。	`1.1`
`last_n_tokens`	`int`	用於重複罰款的最後一個令牌數量。	`64`
`seed`	`int`	用於採樣令牌的種子價值。	`-1`
`max_new_tokens`	`int`	最大生成的新令牌數量。	`256`
`stop`	`List[str]`	遇到時停止生成的序列列表。	`None`
`stream`	`bool`	是否流式傳輸生成的文本。	`False`
`reset`	`bool`	是否在生成文本之前要重置模型狀態。	`True`
`batch_size`	`int`	用於在一個提示中評估令牌的批次大小。	`8`
`threads`	`int`	用於評估令牌的線程數。	`-1`
`context_length`	`int`	最大使用上下文長度。	`-1`
`gpu_layers`	`int`	在GPU上運行的層數。	`0`

注意：當前只有Llama，MPT和Falcon模型支持context_length參數。

`類AutoModelForCausalLM`

`ClassMethod` `AutoModelForCausalLM.from_pretrained`

 from_pretrained (
    model_path_or_repo_id : str ,
    model_type : Optional [ str ] = None ,
    model_file : Optional [ str ] = None ,
    config : Optional [ ctransformers . hub . AutoConfig ] = None ,
    lib : Optional [ str ] = None ,
    local_files_only : bool = False ,
    revision : Optional [ str ] = None ,
    hf : bool = False ,
    ** kwargs
) → LLM

從本地文件或遠程存儲庫中加載語言模型。

args：

model_path_or_repo_id ：模型文件或目錄的路徑或擁抱臉部集線器模型存儲庫的名稱。
model_type ：模型類型。
model_file ：repo或目錄中模型文件的名稱。
config ： AutoConfig對象。
lib ：通往共享庫或avx2 ， avx ， basic的路徑。
local_files_only ：是否僅查看本地文件（即，請勿嘗試下載模型）。
revision ：要使用的特定模型版本。它可以是分支名稱，標籤名稱或提交ID。
hf ：是否要創建一個擁抱的面部變壓器模型。

返回： LLM對象。

`LLM類`

`方法LLM.init`

 __init__ (
    model_path : str ,
    model_type : Optional [ str ] = None ,
    config : Optional [ ctransformers . llm . Config ] = None ,
    lib : Optional [ str ] = None
)

從本地文件加載語言模型。

args：

model_path ：模型文件的路徑。
model_type ：模型類型。
config ： Config對象。
lib ：通往共享庫或avx2 ， avx ， basic的路徑。

`屬性`llm.bos_token_id

開始令牌的開始。

`財產`llm.config

配置對象。

`屬性`llm.context_length

模型的上下文長度。

`財產`llm.embeddings

輸入嵌入。

`屬性`llm.eos_token_id

末端令牌。

`財產`llm.logits

非均衡的對數概率。

`屬性`llm.model_path

模型文件的路徑。

`屬性`llm.model_type

模型類型。

`屬性`llm.pad_token_id

填充令牌。

`屬性`llm.vocab_size

詞彙中的令牌數量。

`方法LLM.detokenize`

 detokenize ( tokens : Sequence [ int ], decode : bool = True ) → Union [ str , bytes ]

將令牌列表轉換為文本。

args：

tokens ：令牌列表。
decode ：是否將文本解碼為UTF-8字符串。

返回：所有令牌的組合文本。

`方法LLM.embed`

 embed (
    input : Union [ str , Sequence [ int ]],
    batch_size : Optional [ int ] = None ,
    threads : Optional [ int ] = None
) → List [ float ]

計算文本或令牌列表的嵌入。

注意：目前只有駱駝和獵鷹模型支持嵌入。

args：

input ：輸入文本或令牌列表以獲取嵌入。
batch_size ：用於評估令牌的批次大小。默認值： 8
threads ：用於評估令牌的線程數。默認值： -1

返回：輸入嵌入。

`方法LLM.eval`

 eval (
    tokens : Sequence [ int ],
    batch_size : Optional [ int ] = None ,
    threads : Optional [ int ] = None
) → None

評估令牌列表。

args：

tokens ：要評估的令牌列表。
batch_size ：用於評估令牌的批次大小。默認值： 8
threads ：用於評估令牌的線程數。默認值： -1

`方法LLM.generate`

 generate (
    tokens : Sequence [ int ],
    top_k : Optional [ int ] = None ,
    top_p : Optional [ float ] = None ,
    temperature : Optional [ float ] = None ,
    repetition_penalty : Optional [ float ] = None ,
    last_n_tokens : Optional [ int ] = None ,
    seed : Optional [ int ] = None ,
    batch_size : Optional [ int ] = None ,
    threads : Optional [ int ] = None ,
    reset : Optional [ bool ] = None
) → Generator [ int , NoneType , NoneType ]

從代幣列表中生成新的令牌。

args：

tokens ：從中生成令牌的令牌列表。
top_k ：用於採樣的頂級值。默認值： 40
top_p ：用於採樣的頂部P值。默認值： 0.95
temperature ：用於採樣的溫度。默認值： 0.8
repetition_penalty ：用於抽樣的重複懲罰。默認值： 1.1
last_n_tokens ：用於重複懲罰的最後一個令牌數量。默認值： 64
seed ：用於採樣令牌的種子價值。默認值： -1
batch_size ：用於評估令牌的批次大小。默認值： 8
threads ：用於評估令牌的線程數。默認值： -1
reset ：是否在生成文本之前重置模型狀態。默認值： True

返回：生成的令牌。

`方法LLM.is_eos_token`

 is_eos_token ( token : int ) → bool

檢查令牌是否是末端令牌。

args：

token ：檢查的令牌。

返回：如果令牌是一個序列的令牌， False True 。

`方法LLM.prepare_inputs_for_generation`

 prepare_inputs_for_generation (
    tokens : Sequence [ int ],
    reset : Optional [ bool ] = None
) → Sequence [ int ]

刪除過去評估的輸入令牌，並更新LLM上下文。

args：

tokens ：輸入令牌列表。
reset ：是否在生成文本之前重置模型狀態。默認值： True

退貨：要評估的令牌列表。

`方法LLM.reset`

 reset () → None

自0.2.27以來棄用。

`方法LLM.sample`

 sample (
    top_k : Optional [ int ] = None ,
    top_p : Optional [ float ] = None ,
    temperature : Optional [ float ] = None ,
    repetition_penalty : Optional [ float ] = None ,
    last_n_tokens : Optional [ int ] = None ,
    seed : Optional [ int ] = None
) → int

從模型中採樣一個令牌。

args：

top_k ：用於採樣的頂級值。默認值： 40
top_p ：用於採樣的頂部P值。默認值： 0.95
temperature ：用於採樣的溫度。默認值： 0.8
repetition_penalty ：用於抽樣的重複懲罰。默認值： 1.1
last_n_tokens ：用於重複懲罰的最後一個令牌數量。默認值： 64
seed ：用於採樣令牌的種子價值。默認值： -1

回報：採樣令牌。

`方法LLM.tokenize`

 tokenize ( text : str , add_bos_token : Optional [ bool ] = None ) → List [ int ]

將文本轉換為令牌列表。

args：

text ：象徵性的文字。
add_bos_token ：是否添加開始序列令牌。

返回：令牌列表。

`方法LLM.call`

 __call__ (
    prompt : str ,
    max_new_tokens : Optional [ int ] = None ,
    top_k : Optional [ int ] = None ,
    top_p : Optional [ float ] = None ,
    temperature : Optional [ float ] = None ,
    repetition_penalty : Optional [ float ] = None ,
    last_n_tokens : Optional [ int ] = None ,
    seed : Optional [ int ] = None ,
    batch_size : Optional [ int ] = None ,
    threads : Optional [ int ] = None ,
    stop : Optional [ Sequence [ str ]] = None ,
    stream : Optional [ bool ] = None ,
    reset : Optional [ bool ] = None
) → Union [ str , Generator [ str , NoneType , NoneType ]]

從提示中生成文本。

args：

prompt ：提示從中生成文本。
max_new_tokens ：要生成的新令牌的最大數量。默認值： 256
top_k ：用於採樣的頂級值。默認值： 40
top_p ：用於採樣的頂部P值。默認值： 0.95
temperature ：用於採樣的溫度。默認值： 0.8
repetition_penalty ：用於抽樣的重複懲罰。默認值： 1.1
last_n_tokens ：用於重複懲罰的最後一個令牌數量。默認值： 64
seed ：用於採樣令牌的種子價值。默認值： -1
batch_size ：用於評估令牌的批次大小。默認值： 8
threads ：用於評估令牌的線程數。默認值： -1
stop ：遇到時停止生成的序列列表。默認值： None
stream ：是否流式傳輸生成的文本。默認值： False
reset ：是否在生成文本之前重置模型狀態。默認值： True

返回：生成的文本。

執照

麻省理工學院

展開

ctransformers

ctransformers

支持的模型

安裝

用法

？變壓器

Langchain

GPU

庫達

羅克

金屬

GPTQ

文件

config

類AutoModelForCausalLM

ClassMethod AutoModelForCausalLM.from_pretrained

LLM類

方法LLM.__init__

屬性llm.bos_token_id

財產llm.config

屬性llm.context_length

財產llm.embeddings

屬性llm.eos_token_id

財產llm.logits

屬性llm.model_path

屬性llm.model_type

屬性llm.pad_token_id

屬性llm.vocab_size

方法LLM.detokenize

方法LLM.embed

方法LLM.eval

方法LLM.generate

方法LLM.is_eos_token

方法LLM.prepare_inputs_for_generation

方法LLM.reset

方法LLM.sample

方法LLM.tokenize

方法LLM.__call__

執照

`類AutoModelForCausalLM`

`ClassMethod` `AutoModelForCausalLM.from_pretrained`

`LLM類`

`方法LLM.init`

`屬性`llm.bos_token_id

`財產`llm.config

`屬性`llm.context_length

`財產`llm.embeddings

`屬性`llm.eos_token_id

`財產`llm.logits

`屬性`llm.model_path

`屬性`llm.model_type

`屬性`llm.pad_token_id

`屬性`llm.vocab_size

`方法LLM.detokenize`

`方法LLM.embed`

`方法LLM.eval`

`方法LLM.generate`

`方法LLM.is_eos_token`

`方法LLM.prepare_inputs_for_generation`

`方法LLM.reset`

`方法LLM.sample`

`方法LLM.tokenize`

`方法LLM.call`