go huggingface 다운로드 - go huggingface 소스 코드 다운로드

go huggingface

기타 소스코드

v0.1.0

다운로드

go-huggingface HuggingFace에서 모델을 다운로드, 토큰화 및 변환합니다.

개요

HuggingFace의 다운로드( hub ), 토큰화( tokenizers ) 및( 향후 작업 ) 모델 변환( models )을 위한 간단한 API? GoMLX를 사용하는 모델

실험 및 개발 중 : hub 패키지는 안정적이지만 tokenizers 와 미래 models 여전히 집중적으로 개발 중입니다.

예

서문: 가져오기 및 변수

 import ("github.com/gomlx/go-huggingface/hub""github.com/gomlx/go-huggingface/tokenizers")var ( // 테스트용 모델 ID.hfModelIDs = []string{ "google/gemma-2 -2b-it", "문장 변환기/all-MiniLM-L6-v2", "protectai/deberta-v3-base-zeroshot-v1-onnx", "KnightsAnalytics/distilbert-base-uncased-finetuned-sst-2-english", "KnightsAnalytics/distilbert-NER", "SamLowe/roberta-base-go_emotions -onnx",
	} hfAuthToken = os.Getenv("HF_TOKEN") // 모델 다운로드를 허용하려면 Huggingface.co에서 HuggingFace 인증 토큰을 생성하세요.)

각 모델의 파일 나열

 for _, modelID := range hfModelIDs { fmt.Printf("n%s:n", modelID) repo :=hub.New(modelID).WithAuth(hfAuthToken) for fileName, err := range repo.IterFileNames() { if err != nil { 당황(err) } fmt.Printf("t%sn", fileName)
	}
}

각 모델의 토크나이저 클래스 나열

 for _, modelID := range hfModelIDs { fmt.Printf("n%s:n", modelID) repo := 허브.New(modelID).WithAuth(hfAuthToken) config, err := tokenizers.GetConfig(repo) if err != nil { 당황(err) } fmt.Printf("ttokenizer_class=%sn", config.TokenizerClass)
}

`google/gemma-2-2b-it` 에 대한 토큰화

출력 "다운로드됨" 메시지는 토크나이저 파일이 아직 캐시되지 않은 경우에만 발생하므로 처음에만 발생합니다.

 repo := 허브.New("google/gemma-2-2b-it").WithAuth(hfAuthToken)tokenizer, err := tokenizers.New(repo)if err != nil { 당황(err) }sentence := " 책이 테이블 위에 있습니다."tokens := tokenizer.Encode(sentence)fmt.Printf("Sentence:t%sn", 문장)fmt.Printf("토큰: t%vn", 토큰)

Downloaded 1/1 files, 4.2 MB downloaded         
Sentence:	The book is on the table.
Tokens:  	[651 2870 603 611 573 3037 235265]

`sentence-transformers/all-MiniLM-L6-v2` 용 ONNX 모델 다운로드 및 실행

처음 3줄만 실제로 go-huggingface 시연하고 있습니다. 나머지 줄은 github.com/gomlx/onnx-gomlx 사용하여 ONNX 모델을 구문 분석하고 GoMLX로 변환한 다음 github.com/gomlx/gomlx 사용하여 몇 문장에 대해 변환된 모델을 실행합니다.

 // ONNX model.repo 가져오기 := 허브.New("sentence-transformers/all-MiniLM-L6-v2").WithAuth(hfAuthToken)onnxFilePath, err := repo.DownloadFile("onnx/model.onnx")if 오류 != nil { 패닉(err) }onnxModel, 오류 := onnx.ReadFile(onnxFilePath)if err != nil {nic(err) }// ONNX 변수를 GoMLX 컨텍스트(변수 저장)로 변환:ctx := context.New()err = onnxModel.VariablesToContext(ctx)if err ! = nil {패닉(err) }// input.sentences 테스트 := []string{ "이것은 예문입니다.", "각각 문장이 변환되었습니다."}inputIDs := [][]int64{
	{101, 2023, 2003, 2019, 2742, 6251, 102},
	{ 101, 2169, 6251, 2003, 4991, 102, 0}}tokenTypeIDs := [][]int64{
	{0, 0, 0, 0, 0, 0, 0},
	{0, 0, 0, 0, 0, 0, 0}}attentionMask := [][]int64{
	{1, 1, 1, 1, 1, 1, 1},
	{1, 1, 1, 1, 1, 1, 0}}// model.embeddings로 GoMLX 그래프 실행 := context.ExecOnce( backends.New(), ctx, func (ctx *context.Context, inputs [] *graph.Node) *graph.Node { modelOutputs := onnxModel.CallGraph(ctx, inputs[0].Graph(), map[string]*graph.Node{ "input_ids": 입력[0], "attention_mask": 입력[1], "token_type_ids": 입력[2]}) return modelOutputs[0]
	},
	 inputIDs, attentionMask, tokenTypeIDs)fmt.Printf("문장: t%qn", 문장)fmt.Printf("Embeddings:t%sn", embeddings)

Sentences: 	["This is an example sentence" "Each sentence is converted"]
Embeddings:	[2][7][384]float32{
 {{0.0366, -0.0162, 0.1682, ..., 0.0554, -0.1644, -0.2967},
  {0.7239, 0.6399, 0.1888, ..., 0.5946, 0.6206, 0.4897},
  {0.0064, 0.0203, 0.0448, ..., 0.3464, 1.3170, -0.1670},
  ...,
  {0.1479, -0.0643, 0.1457, ..., 0.8837, -0.3316, 0.2975},
  {0.5212, 0.6563, 0.5607, ..., -0.0399, 0.0412, -1.4036},
  {1.0824, 0.7140, 0.3986, ..., -0.2301, 0.3243, -1.0313}},
 {{0.2802, 0.1165, -0.0418, ..., 0.2711, -0.1685, -0.2961},
  {0.8729, 0.4545, -0.1091, ..., 0.1365, 0.4580, -0.2042},
  {0.4752, 0.5731, 0.6304, ..., 0.6526, 0.5612, -1.3268},
  ...,
  {0.6113, 0.7920, -0.4685, ..., 0.0854, 1.0592, -0.2983},
  {0.4115, 1.0946, 0.2385, ..., 0.8984, 0.3684, -0.7333},
  {0.1374, 0.5555, 0.2678, ..., 0.5426, 0.4665, -0.5284}}}

확장하다

추가 정보