femtoGPT 是最小生成預訓練 Transformer 的純 Rust 實作。
它可用於使用CPU和GPU進行 GPT 風格語言模型的推理和訓練!
(嘿!我也在寫一本書,很快就會詳細討論 LLM 的實現!在這裡查看:超級程式設計師)
訓練:
cargo run --release -- train
推理:
cargo run --release -- infer
(注意:添加--features gpu
以利用 GPU 加速!)
一切都是從頭開始實現的,包括張量處理邏輯以及最小 GPT 架構的訓練/推理程式碼。
架構與 Andrej Karpathy 的 nanoGPT 視訊講座非常相似/幾乎相同。
對於那些對法學碩士著迷並希望深入了解這些模型如何運作的人來說,femtoGPT 是一個很好的開始。
femtoGPT 僅使用隨機生成庫( rand
/ rand-distr
)、資料序列化庫( serde
/ bincode
用於保存/加載已訓練模型)和並行計算庫( rayon
)。
femtoGPT 是極慢CPU 上的速度相對較快,大多數基本運算(例如矩陣乘法)都是以最簡單的方式實現的。
使用梯度檢查方法檢查梯度的正確性,儘管某些層仍然很可能被錯誤地實現。
(Discord 伺服器用於圍繞專案進行討論!)
確保您的系統上有 Rust 工具鏈,以便編譯和執行該專案:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
如果您想使用 GPU 進行訓練,首先需要確保系統上正確安裝了 GPU 驅動程序,並且它們的 OpenCL 運行時可用。
在 Debian 系統上,您可以透過安裝ocl-icd-opencl-dev
來設定 OpenCL 運行時:
sudo apt install ocl-icd-opencl-dev
好消息!由於 femtoGPT 的 GPU 實作是基於 OpenCL,因此它可以在 NVIDIA 和 AMD 卡上運行,並且您無需在系統上安裝重量級 CUDA 工具包。 OpenCL 運作時就夠了!
現在,您只需將要訓練 GPT 模型的文字放入dataset.txt
內。確保它有少量獨特的角色! (例如,目前資料集僅使用了 65 個不同的唯一字元!)
然後你需要運行:
cargo run --release
它將開始訓練模型並將訓練資料放入train_data
目錄中。您可以停止訓練並稍後繼續!
在莎士比亞資料庫上,在 300k 參數模型上進行數小時的訓練後,輸出如下:
LIS:
Tore hend shater sorerds tougeng an herdofed seng he borind,
Ound ourere sthe, a sou so tousthe ashtherd, m se a man stousshan here hat mend serthe fo witownderstesther s ars at atheno sel theas,
thisth t are sorind bour win soutinds mater horengher
這真是糟糕得令人尷尬,但從好的一面來看,它似乎已經能夠產生易於發音的單字。
我目前正在訓練一個 10M 參數模型,以進一步檢查我的實現的正確性。
2023 年 6 月 5 日更新:
這是在對類似規模的模型進行更多小時的訓練後的新輸出:
What like but wore pad wo me che nogns yous dares,
As supt it nind bupart 'the reed:
And hils not es
顯然模型已經開始學習一些單字和標點符號規則!
2023 年 6 月 9 日更新:
模型能夠達到約 1.4 的損失值
這是一個範例輸出:
Adistition gone; true; schistoes for mine souls!
Before your home, bariechts should be
Carlam on that's a worf quirer of him so.
What look'd lack away more
To him foot; one hour fortious of saves:
Son;
'Tis all Earl mmistling me.
HARSARTIO:
Why, idless of my mocks fan that were percious.
Having I will thou should and the muour ne'er shor
To purple, when duke thy are out him.
But will bid you doth remember nature.
Even OF hencomey, carniffeit, I joy
Warming; my deed, but it on of mortard,
2023 年 6 月 10 日更新,AGI 的火花?哈哈
在從 Reddit 收集的資料集上訓練了 5 層模型,並使用詞彙大小為 500 的句子分詞器進行分詞,在 32 核心 CPU 上訓練了大約 10 個小時:
提示“我認為” :
I think it s not a scientific method of atheism
I think it s not a fairly simple possible to have a
I think that s a possible point, but I m not sure if you re not
I think that s a great idea, but I m not sure how these are you talking
I think it s a simple thing to do, but I m not sure how thes
I think that s a good thing, but I don t think that s what I mean, I think I
I think that s the only way to be able to use it to be in the same way, but
I think that the police are relatively relatively relative
I think that s a simple moron is a simple character
I think that s the only way to go to the second side of the pol
提示「生命就是」 :
Life is not a good idea, but it s not a perfection of the opp
Life is a good thing to say that I don t know what you re talking about, but I don
Life is not the same as a religion, but I m not sure if you re a
Life is a perfectly good job of arguing that you are alm
Life is a perfectly good job of the opposite of the f
Life is a fundamentalist, and then I m not sure how the h
Life is not a good idea, and it s not a perfectly good job, but I
Life is not the same as atheists, but that s the only way to be ac
Life is a bit of a single one of these industry is a f
Life is a good idea to get the opposite of the police offic
提示「太傷心了」 :
So sad that you can tell you what? I think I ve been using it on the scre
So sad that I don t know about it, but I don t think I m not afraid to
So sad that I m not sure if you re not arguing with the fact that you
So sad that I was involved in the future, and I have a few we
So sad that s what I said, I m sure you are almost everything you
So sad that you can do it, and I don t think that the fact that it s a po
So sad that I m not sure if you re arguing with the fact that they are
So sad that s the one too much time, but I m not sure if you re arg
So sad that you are sadly supposed to be a big deal in the world
So sad that I don t know about this, but I m not sure how you can do it, but
2023 年 6 月 29 日更新
實作 GPU 訓練器後,我們能夠訓練更大的模型。以下是來自 8 層 8 頭 128 嵌入度模型的一些樣本,在 TinyStories 資料集上訓練,詞彙大小為 1000:
Once upon a time, there was a little girl named Lily.
She loved to play with her toys and she had a lot of fun.
One day, Lily saw a big chicky playing with her toys.
She asked her mom, "Can I play with her toys?" Her mom said,
"Sure, Lily. But we have to clean the pales. Let's suet some candy, Lily."
Lily nodded and went to her mom. They played with the mots and staugning her toys.
Once upon a time, there was a little girl named Lily.
She loved to play outside and explore. One day, she found a jung on the ground.
She picked it up and tecked it. She ran around and saw it. She was very sad.
She asked her mom for her mom. Her mom said, "Lily, I'm going to find it!" Lily said.
She ran to the slock and took her to the teplace. She went to the park and found a molla.
There was a boy named Tim. Tim loved to play with his toys.
One day, Tim's mom came to the park. Tim saw a big, red ball and wanted to play with it.
Tim wanted to play with the ball. Tim was very excited. He wanted to play with the ball.
But the ball was too fast. Tim wanted to play with the ball. But the ball was too fast.
Tim tried to catch it, but it was too fast. Tim was sad. He tried to run away,
but he did not want to play. Tim was sad. He did not want to play with the ball.