femtoGPT 是最小生成预训练 Transformer 的纯 Rust 实现。
它可用于使用CPU和GPU进行 GPT 风格语言模型的推理和训练!
(嘿!我也在写一本书,很快就会详细讨论 LLM 的实现!在这里查看:超级程序员)
训练:
cargo run --release -- train
推理:
cargo run --release -- infer
(注意:添加--features gpu
以利用 GPU 加速!)
一切都是从头开始实现的,包括张量处理逻辑以及最小 GPT 架构的训练/推理代码。
该架构与 Andrej Karpathy 的 nanoGPT 视频讲座非常相似/几乎相同。
对于那些对法学硕士着迷并希望深入了解这些模型如何运作的人来说,femtoGPT 是一个很好的开始。
femtoGPT 仅使用随机生成库( rand
/ rand-distr
)、数据序列化库( serde
/ bincode
用于保存/加载已训练模型)和并行计算库( rayon
)。
femtoGPT 是极慢CPU 上的速度相对较快,并且大多数基本运算(例如矩阵乘法)都是以最简单的方式实现的。
使用梯度检查方法检查梯度的正确性,尽管某些层仍然很可能被错误地实现。
(Discord 服务器用于围绕项目进行讨论!)
确保您的系统上有 Rust 工具链,以便编译和运行该项目:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
如果您想使用 GPU 进行训练,首先需要确保系统上正确安装了 GPU 驱动程序,并且它们的 OpenCL 运行时可用。
在 Debian 系统上,您可以通过安装ocl-icd-opencl-dev
包来设置 OpenCL 运行时:
sudo apt install ocl-icd-opencl-dev
好消息!由于 femtoGPT 的 GPU 实现基于 OpenCL,因此它可以在 NVIDIA 和 AMD 卡上运行,并且您无需在系统上安装重量级 CUDA 工具包。 OpenCL 运行时就足够了!
现在,您只需将要训练 GPT 模型的文本放入dataset.txt
内。确保它有少量独特的角色! (例如,当前数据集仅使用了 65 个不同的唯一字符!)
然后你需要运行:
cargo run --release
它将开始训练模型并将训练数据放入train_data
目录中。您可以停止训练并稍后继续!
在莎士比亚数据库上,在 300k 参数模型上进行数小时的训练后,输出如下:
LIS:
Tore hend shater sorerds tougeng an herdofed seng he borind,
Ound ourere sthe, a sou so tousthe ashtherd, m se a man stousshan here hat mend serthe fo witownderstesther s ars at atheno sel theas,
thisth t are sorind bour win soutinds mater horengher
这真是糟糕得令人尴尬,但从好的一面来看,它似乎已经能够生成易于发音的单词。
我目前正在训练一个 10M 参数模型,以进一步检查我的实现的正确性。
2023 年 6 月 5 日更新:
这是在对类似规模的模型进行更多小时的训练后的新输出:
What like but wore pad wo me che nogns yous dares,
As supt it nind bupart 'the reed:
And hils not es
显然模型已经开始学习一些单词和标点符号规则!
2023 年 6 月 9 日更新:
模型能够达到约 1.4 的损失值
这是一个示例输出:
Adistition gone; true; schistoes for mine souls!
Before your home, bariechts should be
Carlam on that's a worf quirer of him so.
What look'd lack away more
To him foot; one hour fortious of saves:
Son;
'Tis all Earl mmistling me.
HARSARTIO:
Why, idless of my mocks fan that were percious.
Having I will thou should and the muour ne'er shor
To purple, when duke thy are out him.
But will bid you doth remember nature.
Even OF hencomey, carniffeit, I joy
Warming; my deed, but it on of mortard,
2023 年 6 月 10 日更新,AGI 的火花?哈哈
在从 Reddit 收集的数据集上训练了 5 层模型,并使用词汇大小为 500 的句子分词器进行分词,在 32 核 CPU 上训练了大约 10 个小时:
提示“我认为” :
I think it s not a scientific method of atheism
I think it s not a fairly simple possible to have a
I think that s a possible point, but I m not sure if you re not
I think that s a great idea, but I m not sure how these are you talking
I think it s a simple thing to do, but I m not sure how thes
I think that s a good thing, but I don t think that s what I mean, I think I
I think that s the only way to be able to use it to be in the same way, but
I think that the police are relatively relatively relative
I think that s a simple moron is a simple character
I think that s the only way to go to the second side of the pol
提示“生命就是” :
Life is not a good idea, but it s not a perfection of the opp
Life is a good thing to say that I don t know what you re talking about, but I don
Life is not the same as a religion, but I m not sure if you re a
Life is a perfectly good job of arguing that you are alm
Life is a perfectly good job of the opposite of the f
Life is a fundamentalist, and then I m not sure how the h
Life is not a good idea, and it s not a perfectly good job, but I
Life is not the same as atheists, but that s the only way to be ac
Life is a bit of a single one of these industry is a f
Life is a good idea to get the opposite of the police offic
提示“太伤心了” :
So sad that you can tell you what? I think I ve been using it on the scre
So sad that I don t know about it, but I don t think I m not afraid to
So sad that I m not sure if you re not arguing with the fact that you
So sad that I was involved in the future, and I have a few we
So sad that s what I said, I m sure you are almost everything you
So sad that you can do it, and I don t think that the fact that it s a po
So sad that I m not sure if you re arguing with the fact that they are
So sad that s the one too much time, but I m not sure if you re arg
So sad that you are sadly supposed to be a big deal in the world
So sad that I don t know about this, but I m not sure how you can do it, but
2023 年 6 月 29 日更新
实施 GPU 训练器后,我们能够训练更大的模型。以下是来自 8 层 8 头 128 嵌入度模型的一些样本,在 TinyStories 数据集上训练,词汇大小为 1000:
Once upon a time, there was a little girl named Lily.
She loved to play with her toys and she had a lot of fun.
One day, Lily saw a big chicky playing with her toys.
She asked her mom, "Can I play with her toys?" Her mom said,
"Sure, Lily. But we have to clean the pales. Let's suet some candy, Lily."
Lily nodded and went to her mom. They played with the mots and staugning her toys.
Once upon a time, there was a little girl named Lily.
She loved to play outside and explore. One day, she found a jung on the ground.
She picked it up and tecked it. She ran around and saw it. She was very sad.
She asked her mom for her mom. Her mom said, "Lily, I'm going to find it!" Lily said.
She ran to the slock and took her to the teplace. She went to the park and found a molla.
There was a boy named Tim. Tim loved to play with his toys.
One day, Tim's mom came to the park. Tim saw a big, red ball and wanted to play with it.
Tim wanted to play with the ball. Tim was very excited. He wanted to play with the ball.
But the ball was too fast. Tim wanted to play with the ball. But the ball was too fast.
Tim tried to catch it, but it was too fast. Tim was sad. He tried to run away,
but he did not want to play. Tim was sad. He did not want to play with the ball.