llama.go下載 - llama.go原始碼下載

llama.go

其他源碼

v1.4: Server Mode

下載

最後 - 好消息！

我已經開始在這裡重新實現該庫： FastTensors

如果您想在純 Go 中看到 GGML 相容的實現，請給它加註星標。

正在尋找使用 Golang 進行 LLM 偵錯和推理？

請查看我的相關項目Booster

動機

我們夢想著這樣一個世界：ML 駭客們可以在他們的家庭實驗室中摸索非常大的 GPT模型，而無需 GPU 叢集消耗大量的資金。

該專案的程式碼基於 Georgi Gerganov 的傳奇ggml.cpp框架，以 C++ 編寫，同樣注重效能和優雅。

我們希望使用 Golang 而不是功能強大但等級太低的語言將允許更多的採用。

V0 路線圖

純 Golang 中的張量數學
實現 LLaMA 神經網路架構和模型載入
使用較小的 LLaMA-7B 模型進行測試
確保 Go 推理的工作方式與 C++ 完全相同
讓Go發光吧！啟用多線程和訊息傳遞以提高效能

V1 路線圖 - Spring'23

與 Mac、Linux 和 Windows 的跨平台相容性
為 ML 駭客發布第一個穩定版本 - v1.0
啟用更大的 LLaMA 模型：13B、30B、65B - v1.1
Apple Silicon（現代 Mac）和 ARM 伺服器上的 ARM NEON 支援 - v1.2
透過支援 Intel 和 AMD 的 x64 AVX2 提升效能 - v1.2
更好的記憶體使用和 GC 優化 - v1.3
引入伺服器模式（嵌入式 REST API）以在實際專案中使用 - v1.4
發布轉換後的模型以透過 Internet 免費存取 - v1.4

V2 路線圖 - Winter'23

V3 路線圖 - Spring'23

允許複雜項目使用外掛程式和外部 API
讓模型訓練和微調
加速 GPU 卡和叢集上的執行速度
FP16 和 BF16 數學（如果有硬體支援）
INT4 和 GPTQ 量化
AMD Radeon GPU 支援 OpenCL

如何跑步？

首先，自行取得並轉換原始 LLaMA 模型，或直接下載現成的模型：

LLaMA-7B： llama-7b-fp32.bin

LLaMA-13B： llama-13b-fp32.bin

兩種型號都儲存 FP32 權重，因此 LLaMA-7B 至少需要 32Gb RAM（不是 VRAM 或 GPU RAM）。 LLaMA-13B 雙倍至 64Gb。

接下來，從原始程式碼建立應用程式二進位（請參閱下面的說明），或只下載已經建置的二進位檔案：

Windows： llama-go-v1.4.0.exe

MacOS： llama-go-v1.4.0-macos

Linux: llama-go-v1.4.0-linux

現在您已經擁有了可執行檔和模型，請親自嘗試：

llama-go-v1.4.0-macos 
    --model ~ /models/llama-7b-fp32.bin 
    --prompt " Why Golang is so popular? "

有用的命令列標誌：

--prompt   Text prompt from user to feed the model input
--model    Path and file name of converted .bin LLaMA model [ llama-7b-fp32.bin, etc ]
--server   Start in Server Mode acting as REST API endpoint
--host     Host to allow requests from in Server Mode [ localhost by default ]
--port     Port listen to in Server Mode [ 8080 by default ]
--pods     Maximum pods or units of parallel execution allowed in Server Mode [ 1 by default ]
--threads  Adjust to the number of CPU cores you want to use [ all cores by default ]
--context  Context size in tokens [ 1024 by default ]
--predict  Number of tokens to predict [ 512 by default ]
--temp     Model temperature hyper parameter [ 0.5 by default ]
--silent   Hide welcome logo and other output [ shown by default ]
--chat     Chat with user in interactive mode instead of compute over static prompt
--profile  Profe CPU performance while running and store results to cpu.pprof file
--avx      Enable x64 AVX2 optimizations for Intel and AMD machines
--neon     Enable ARM NEON optimizations for Apple Macs and ARM server

投入生產

LLaMA.go 嵌入了公開 REST API 的獨立 HTTP 伺服器。要啟用它，請使用特殊標誌運行應用程式：

llama-go-v1.4.0-macos 
    --model ~ /models/llama-7b-fp32.bin 
    --server 
    --host 127.0.0.1 
    --port 8080 
    --pods 4 
    --threads 4

根據模型大小、可用的 CPU 核心數量、要並行處理的請求數量、獲得答案的速度，明智地選擇Pod和執行緒參數。

Pod是許多可能並行運行的推理實例。

Threads參數設定 pod 內將使用多少核心來進行張量數學運算。

例如，如果您的電腦具有 16 個硬體核心，能夠並行運行 32 個超線程，那麼您最終可能會得到類似的結果：

--server --pods 4 --threads 8

當沒有空閒的 pod 來處理到達的請求時，它將被放入等待佇列中，並在某個 pod 完成作業時啟動。

REST API 範例

安排新工作

使用包含唯一 UUID v4 的 JSON 和提示將 POST 請求（使用 Postman）傳送到您的伺服器位址：

{
    "id" : " 5fb8ebd0-e0c9-4759-8f7d-35590f6c9fc3 " ,
    "prompt" : " Why Golang is so popular? "
}

檢查工作狀態

將 GET 請求（使用 Postman 或瀏覽器）傳送到 URL，例如 http://host:port/jobs/status/:id

GET http://localhost:8080/jobs/status/5fb8ebd0-e0c9-4759-8f7d-35590f6c9fcb

取得結果

將 GET 請求（使用 Postman 或瀏覽器）傳送到 URL，例如 http://host:port/jobs/:id

GET http://localhost:8080/jobs/5fb8ebd0-e0c9-4759-8f7d-35590f6c9fcb

如何建造

首先，安裝Golang和git （如果是 Windows，則需要下載安裝程式）。

brew install git
brew install golang

然後克隆存儲庫並進入項目資料夾：

 git clone https://github.com/gotzmann/llama.go.git
cd llama.go

安裝外部依賴項的一些 Go 魔法：

 go mod tidy
go mod vendor

現在我們準備從原始碼建置二進位檔案：

go build -o llama-go-v1.exe -ldflags " -s -w " main.go

常問問題

1) 從哪裡可以獲得原始的 LLaMA 模型？

直接聯繫 Meta 或只是尋找一些 torrent 替代方案。

2) 如何將原始LLaMA檔案轉換為支援的格式？

將原始 PyTorch FP16 檔案放入models目錄中，然後使用指令進行轉換：

python3 ./scripts/convert.py ~ /models/LLaMA/7B/ 0

展開

附加信息

版本 v1.4: Server Mode
類型其他源碼
更新時間 2024-11-30
大小 10.3MB
來自於 Github

相關應用

llama models

2024-11-10
go

2024-11-05
LLaMA Factory

2024-11-02
GO GO 磁力

2024-02-12
程式碼駱駝

2023-10-30
駱駝2

2023-08-17

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
waymo open dataset

其他源碼

December 2023 Update
SmartTube

其他源碼

24.71 Stable
Sunamu

其他源碼

Release 2.2.0
waymo open dataset

其他源碼

December 2023 Update
wp functions

其他類別

1.0.0
termwind

其他類別

v2.3.0

相關資訊全部