go huggingface下載 - go huggingface原始碼下載

go huggingface

其他源碼

v0.1.0

下載

go-huggingface ，從 HuggingFace 下載、標記和轉換模型。

概述

用於下載（ hub ）、標記化（ tokenizers ）和（未來工作）HuggingFace 模型轉換（ models ）的簡單 API？使用 GoMLX 的模型。

實驗和開發中：雖然hub包一直穩定， tokenizers和未來models仍在緊張開發中。

範例

前言：導入和變數

import ("github.com/gomlx/go-huggingface/hub""github.com/gomlx/go-huggingface/tokenizers")var ( // 用於測試的模型ID.hfModelIDs = []string{ "google/gemma -2 -2b-it"、"sentence-transformers/all-MiniLM-L6-v2"、"protectai/deberta-v3-base-zeroshot-v1-onnx"、"KnightsAnalytics/distilbert-base-uncased-finetuned-sst -2 -english", "KnightsAnalytics/distilbert-NER", "SamLowe/roberta-base-go_emotions-onnx",
	} hfAuthToken = os.Getenv("HF_TOKEN") // 在 Huggingface.co 中建立 HuggingFace 驗證令牌，以允許下載模型。

列出每個模型的文件

for _, modelID := range hfModelIDs { fmt.Printf("n%s:n", modelID) repo := hub.New(modelID).WithAuth(hfAuthToken) for fileName, err := range repo.IterFileNames() { if err != nil { 恐慌(err) } fmt.Printf("t%sn", fileName)
	}
}

列出每個模型的分詞器類

for _, modelID := range hfModelIDs { fmt.Printf("n%s:n", modelID) repo := hub.New(modelID).WithAuth(hfAuthToken) config, err := tokenizers.GetConfig(errpo) if if != nil { 恐慌(err) } fmt.Printf("ttokenizer_class=%sn", config.TokenizerClass)
}

為`google/gemma-2-2b-it`進行標記化

僅在標記生成器檔案尚未快取時才會輸出「已下載」訊息，因此僅在第一次：

 repo := hub.New("google/gemma-2-2b-it").WithAuth(hfAuthToken)tokenizer, err := tokenizers.New(repo)if err != nil {panic(err) }句子 := "書在桌上。

Downloaded 1/1 files, 4.2 MB downloaded         
Sentence:	The book is on the table.
Tokens:  	[651 2870 603 611 573 3037 235265]

下載並執行`sentence-transformers/all-MiniLM-L6-v2` ONNX 模型

只有前 3 行實際上是在演示go-huggingface 。其餘行使用github.com/gomlx/onnx-gomlx解析 ONNX 模型並將其轉換為 GoMLX，然後github.com/gomlx/gomlx執行幾個句子的轉換後的模型。

 // 取得 ONNX model.repo := hub.New("sentence-transformers/all-MiniLM-L6-v2").WithAuth(hfAuthToken)onnxFilePath, err := repo.DownloadFile("onnx/model.onnxFile")if err != nil {panic(err) }onnxModel, err := onnx.ReadFile(onnxFilePath)if err != nil {panic(err) }// 將ONNX 變數轉換為GoMLX 上下文（儲存變數）:ctx := context .New()err = onnxModel.VariablesToContext(ctx)if err != nil { panic(err) }//測試input.sentences := []string{ "這是一個例句", "每個句子都轉換了"}輸入 ID := [][]int64{
	{101、2023、2003、2019、2742、6251、102}、
	{ 101, 2169, 6251, 2003, 4991, 102, 0}}tokenTypeID := [][]int64{
	{0, 0, 0, 0, 0, 0, 0},
	{0, 0, 0, 0, 0, 0, 0}}attentionMask := [][]int64{
	{1, 1, 1, 1, 1, 1, 1},
	{1, 1, 1, 1, 1, 1, 0}}// 使用 model.embeddings 執行 GoMLX 圖 := context.ExecOnce( backends.New(), ctx, func (ctx *context.Context, inputs [] *graph.Node) *graph.Node { modelOutputs := onnxModel.CallGraph(ctx, input[0].Graph(), map[string]*graph.Node{ "input_ids": input[0], "attention_mask":輸入[1]，「token_type_ids」：輸入[2]}）返回modelOutputs[0]
	},
	 inputIDs，attentionMask，tokenTypeIDs）fmt.Printf（“句子：t％qn”，句子）fmt.Printf（“嵌入：t％sn”，嵌入）

Sentences: 	["This is an example sentence" "Each sentence is converted"]
Embeddings:	[2][7][384]float32{
 {{0.0366, -0.0162, 0.1682, ..., 0.0554, -0.1644, -0.2967},
  {0.7239, 0.6399, 0.1888, ..., 0.5946, 0.6206, 0.4897},
  {0.0064, 0.0203, 0.0448, ..., 0.3464, 1.3170, -0.1670},
  ...,
  {0.1479, -0.0643, 0.1457, ..., 0.8837, -0.3316, 0.2975},
  {0.5212, 0.6563, 0.5607, ..., -0.0399, 0.0412, -1.4036},
  {1.0824, 0.7140, 0.3986, ..., -0.2301, 0.3243, -1.0313}},
 {{0.2802, 0.1165, -0.0418, ..., 0.2711, -0.1685, -0.2961},
  {0.8729, 0.4545, -0.1091, ..., 0.1365, 0.4580, -0.2042},
  {0.4752, 0.5731, 0.6304, ..., 0.6526, 0.5612, -1.3268},
  ...,
  {0.6113, 0.7920, -0.4685, ..., 0.0854, 1.0592, -0.2983},
  {0.4115, 1.0946, 0.2385, ..., 0.8984, 0.3684, -0.7333},
  {0.1374, 0.5555, 0.2678, ..., 0.5426, 0.4665, -0.5284}}}

展開

附加信息