build nanogpt下載 - build nanogpt原始碼下載

build nanogpt

其他源碼

下載

建構奈米GPT

該儲存庫包含 nanoGPT 的從頭開始複製。 git 提交是專門一步一步地保持乾淨的，這樣人們就可以輕鬆瀏覽 git 提交歷史記錄以查看它的緩慢構建。此外，YouTube 上還有一個附帶的視訊講座，您可以在其中看到我介紹每個提交並解釋各個部分。

我們基本上從一個空文件開始，然後逐步複製 GPT-2 (124M) 模型。如果你有更多的耐心或金錢，程式碼還可以重現 GPT-3 模型。雖然 GPT-2 (124M) 模型在當時（2019 年，大約 5 年前）可能訓練了相當長一段時間，但今天，複製它只需大約 1 小時和大約 10 美元。如果你沒有足夠的GPU，你需要一個雲端GPU盒，為此我推薦Lambda。

請注意，GPT-2 和 GPT-3 以及這兩種簡單語言模型都是在網路文件上進行訓練的，它們所做的只是「夢想」網路文件。因此，此儲存庫/影片不涵蓋聊天微調，而且您不能像與 ChatGPT 交談一樣與它交談。微調過程（雖然概念上非常簡單 - SFT 只是交換資料集並繼續訓練）在這部分之後進行，並將在稍後介紹。現在，如果你在 10B 標記訓練後用「你好，我是一個語言模型」提示 124M 模型，它會說這樣的話：

 Hello, I'm a language model, and my goal is to make English as easy and fun as possible for everyone, and to find out the different grammar rules
Hello, I'm a language model, so the next time I go, I'll just say, I like this stuff.
Hello, I'm a language model, and the question is, what should I do if I want to be a teacher?
Hello, I'm a language model, and I'm an English person. In languages, "speak" is really speaking. Because for most people, there's

經過 40B 代幣的訓練後：

 Hello, I'm a language model, a model of computer science, and it's a way (in mathematics) to program computer programs to do things like write
Hello, I'm a language model, not a human. This means that I believe in my language model, as I have no experience with it yet.
Hello, I'm a language model, but I'm talking about data. You've got to create an array of data: you've got to create that.
Hello, I'm a language model, and all of this is about modeling and learning Python. I'm very good in syntax, however I struggle with Python due

哈哈。不管怎樣，一旦影片出來，這也將是一個常見問題解答的地方，也是一個修復和勘誤的地方，我相信其中會有很多:)

對於討論和問題，請使用“討論”選項卡，為了更快地溝通，請查看我的 Zero To Hero Discord，頻道#nanoGPT ：

影片

讓我們重現 GPT-2 (124M) YouTube 講座

勘誤表

小清理，一旦我們切換到 Flash Attention，我們就忘記刪除偏差的register_buffer ，並用最近的 PR 修復了。

早期版本的 PyTorch 可能難以從 uint16 轉換為 long。在load_tokens內部，我們加入了npt = npt.astype(np.int32)來使用 numpy 將 uint16 轉換為 int32，然後再轉換為 torch 張量，然後轉換為 long。

torch.autocast函數接受一個 arg device_type ，我試圖頑固地傳遞device希望它能正常工作，但 PyTorch 實際上只需要類型並在某些版本的 PyTorch 中創建錯誤。因此，我們希望將設備cuda:3剝離為cuda 。目前，裝置mps （Apple Silicon）將成為device_type CPU，我不能 100% 確定這是 PyTorch 的預期方式。

令人困惑的是， model.require_backward_grad_sync實際上被前向和後向傳遞使用。向上移動了這條線，以便它也適用於前傳。