build nanogpt下载 - build nanogpt源代码下载

build nanogpt

其他源码

下载

构建纳米GPT

该存储库包含 nanoGPT 的从头开始复制。 git 提交是专门一步一步地保持干净的，这样人们就可以轻松地浏览 git 提交历史记录来查看它的缓慢构建。此外，YouTube 上还有一个附带的视频讲座，您可以在其中看到我介绍每个提交并解释各个部分。

我们基本上从一个空文件开始，然后逐步复制 GPT-2 (124M) 模型。如果你有更多的耐心或金钱，代码还可以重现 GPT-3 模型。虽然 GPT-2 (124M) 模型在当时（2019 年，大约 5 年前）可能训练了相当长一段时间，但今天，复制它只需大约 1 小时和大约 10 美元。如果你没有足够的GPU，你需要一个云GPU盒，为此我推荐Lambda。

请注意，GPT-2 和 GPT-3 以及这两种简单语言模型都是在互联网文档上进行训练的，它们所做的只是“梦想”互联网文档。因此，此存储库/视频不涵盖聊天微调，并且您不能像与 ChatGPT 交谈一样与它交谈。微调过程（虽然概念上非常简单 - SFT 只是交换数据集并继续训练）在这部分之后进行，并将在稍后介绍。现在，如果你在 10B 标记训练后用“你好，我是一个语言模型”提示 124M 模型，它会说这样的话：

 Hello, I'm a language model, and my goal is to make English as easy and fun as possible for everyone, and to find out the different grammar rules
Hello, I'm a language model, so the next time I go, I'll just say, I like this stuff.
Hello, I'm a language model, and the question is, what should I do if I want to be a teacher?
Hello, I'm a language model, and I'm an English person. In languages, "speak" is really speaking. Because for most people, there's

经过 40B 代币的训练后：

 Hello, I'm a language model, a model of computer science, and it's a way (in mathematics) to program computer programs to do things like write
Hello, I'm a language model, not a human. This means that I believe in my language model, as I have no experience with it yet.
Hello, I'm a language model, but I'm talking about data. You've got to create an array of data: you've got to create that.
Hello, I'm a language model, and all of this is about modeling and learning Python. I'm very good in syntax, however I struggle with Python due

哈哈。不管怎样，一旦视频出来，这也将是一个常见问题解答的地方，也是一个修复和勘误的地方，我相信其中会有很多:)

对于讨论和问题，请使用“讨论”选项卡，为了更快地沟通，请查看我的 Zero To Hero Discord，频道#nanoGPT ：

视频

让我们重现 GPT-2 (124M) YouTube 讲座

勘误表

小清理，一旦我们切换到 Flash Attention，我们就忘记删除偏差的register_buffer ，并用最近的 PR 修复了。

早期版本的 PyTorch 可能难以从 uint16 转换为 long。在load_tokens内部，我们添加了npt = npt.astype(np.int32)来使用 numpy 将 uint16 转换为 int32，然后再转换为 torch 张量，然后转换为 long。

torch.autocast函数接受一个 arg device_type ，我试图顽固地传递device希望它能正常工作，但 PyTorch 实际上只需要类型并在某些版本的 PyTorch 中创建错误。因此，我们希望将设备cuda:3剥离为cuda 。目前，设备mps （Apple Silicon）将成为device_type CPU，我不能 100% 确定这是 PyTorch 的预期方式。

令人困惑的是， model.require_backward_grad_sync实际上被前向和后向传递使用。向上移动了这条线，以便它也适用于前传。