quantkit下載 - quantkit原始碼下載

quantkit

其他源碼

下載

定量工具包

一款用於輕鬆下載和轉換 HuggingFace 模型的工具。

安裝

如果您使用的是配備 NVIDIA/CUDA GPU 的電腦並需要 AWQ/GPTQ 支援：

 pip3 install llm-quantkit[cuda]

否則，預設安裝有效。

 pip3 install llm-quantkit

要求

如果您需要設備特定的手電筒，請先安裝它。

此專案依賴 torch、awq、exl2、gptq 和 hqq 函式庫。
其中一些依賴項尚不支援 Python 3.12。
支援的 Python：3.8、3.9、3.10 和 3.11

用法

 Usage: quantkit [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  download    Download model from huggingface.
  safetensor  Download and/or convert a pytorch model to safetensor format.
  awq         Download and/or convert a model to AWQ format.
  exl2        Download and/or convert a model to EXL2 format.
  gguf        Download and/or convert a model to GGUF format.
  gptq        Download and/or convert a model to GPTQ format.
  hqq         Download and/or convert a model to HQQ format.

指令後的第一個參數應該是 HF 儲存庫 ID (mistralai/Mistral-7B-v0.1) 或已包含模型檔案的本機目錄。

下載命令預設下載到 HF 快取並在輸出目錄中產生符號鏈接，但有一個 --no-cache 選項將模型檔案放置在輸出目錄中。

AWQ 預設為 4 位，群組大小 128，零點 True。
GPTQ 預設為 4 位，組大小 128，啟動順序 False。
EXL2 預設為 8 個頭位，但沒有預設位元率。
GGUF 預設沒有 imatrix，但沒有預設的 quant-type。
HQQ 預設為 4 位，群組大小 64，zero_point=True。

範例

從 HF 下載模型並且不使用 HF 快取：

 quantkit download teknium/Hermes-Trismegistus-Mistral-7B --no-cache

僅下載模型的 safetensor 版本（對於具有 torch 和 safetensor 的模型有用）：

 quantkit download mistralai/Mistral-7B-v0.1 --no-cache --safetensors-only -out mistral7b

從 Huggingface 儲存庫的特定版本下載：

 uantkit download turboderp/TinyLlama-1B-32k-exl2 --branch 6.0bpw --no-cache -out TinyLlama-1B-32k-exl2-b6

下載模型並將其轉換為 safetensor，刪除原始 pytorch bin：

 quantkit safetensor migtissera/Tess-10.7B-v1.5b --delete-original

下載模型並將其轉換為 GGUF (Q5_K)：

 quantkit gguf TinyLlama/TinyLlama-1.1B-Chat-v1.0 -out TinyLlama-1.1B-Q5_K.gguf Q5_K

使用 imatrix 下載模型並將其轉換為 GGUF，卸載 200 層：

 quantkit gguf TinyLlama/TinyLlama-1.1B-Chat-v1.0 -out TinyLlama-1.1B-IQ4_XS.gguf IQ4_XS --built-in-imatrix -ngl 200

下載模型並將其轉換為 AWQ：

 quantkit awq mistralai/Mistral-7B-v0.1 -out Mistral-7B-v0.1-AWQ

將模型轉換為 GPTQ（4 位元/組大小 32）：

 quantkit gptq mistral7b -out Mistral-7B-v0.1-GPTQ -b 4 --group-size 32

將模型轉換為 exllamav2：

 quantkit exl2 mistralai/Mistral-7B-v0.1 -out Mistral-7B-v0.1-exl2-b8-h8 -b 8 -hb 8

將模型轉換為 HQQ：

 quantkit hqq mistralai/Mistral-7B-v0.1 -out Mistral-7B-HQQ-w4-gs64

硬體需求

這是在測試中對我有用的內容。提交 PR 或問題，並更新各種尺寸卡片的功能。
除了 iMatrix 和 Exllamav2 要求最大層適合單一 GPU 之外，GGUF 轉換不需要 GPU。

型號尺寸	定量	顯存	成功的
7B	加權平均質量	24GB	✅
7B	EXL2	24GB	✅
7B	GGUF	24GB	✅
7B	通用PTQ	24GB	✅
7B	華QQ	24GB	✅
13B	加權平均質量	24GB	✅
13B	EXL2	24GB	✅
13B	GGUF	24GB	✅
13B	通用PTQ	24GB
13B	華QQ	24GB	？
34B	加權平均質量	24GB
34B	EXL2	24GB	✅
34B	GGUF	24GB	✅
34B	通用PTQ	24GB
34B	華QQ	24GB	？
70B	加權平均質量	24GB
70B	EXL2	24GB	✅
70B	GGUF	24GB	✅
70B	通用PTQ	24GB
70B	華QQ	24GB	？