femtoGPT adalah implementasi Rust murni dari Transformator Terlatih Generatif minimal.
Ini dapat digunakan untuk inferensi dan pelatihan model bahasa gaya GPT menggunakan CPU dan GPU !
( HEI! Saya juga sedang menulis buku yang akan segera membahas implementasi LLM secara detail! Lihat di sini: Sang Super Programmer)
Pelatihan:
cargo run --release -- train
Kesimpulan:
cargo run --release -- infer
(Catatan: Tambahkan --features gpu
untuk meningkatkan kecepatan GPU!)
Semuanya diimplementasikan dari awal, termasuk logika pemrosesan tensor beserta kode pelatihan/inferensi arsitektur GPT minimal.
Arsitekturnya sangat mirip/hampir identik dengan video ceramah nanoGPT Andrej Karpathy.
femtoGPT adalah awal yang baik bagi mereka yang tertarik dengan LLM dan ingin memahami cara kerja model ini pada tingkat yang sangat mendalam.
femtoGPT hanya menggunakan perpustakaan generasi acak ( rand
/ rand-distr
), perpustakaan serialisasi data ( serde
/ bincode
untuk menyimpan/memuat model yang sudah dilatih) dan perpustakaan komputasi paralel ( rayon
).
femtoGPT adalah SANGAT LAMBAT relatif cepat pada CPU , dan sebagian besar operasi primitif (misalnya perkalian matriks) diimplementasikan dengan cara yang paling sederhana.
Kebenaran gradien diperiksa menggunakan metode pemeriksaan gradien, meskipun masih sangat mungkin bahwa beberapa lapisan diimplementasikan secara salah.
(Server perselisihan untuk diskusi seputar proyek!)
Pastikan Anda memiliki toolchain Rust di sistem Anda, untuk mengkompilasi dan menjalankan proyek:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Jika Anda ingin berlatih menggunakan GPU, pertama-tama Anda harus memastikan driver GPU Anda terinstal dengan benar di sistem Anda, dan runtime OpenCL-nya tersedia.
Pada sistem Debian, Anda dapat mengatur runtime OpenCL dengan menginstal paket ocl-icd-opencl-dev
:
sudo apt install ocl-icd-opencl-dev
KABAR BAIK! Karena implementasi GPU femtoGPT didasarkan pada OpenCL, implementasi ini dapat berjalan pada kartu NVIDIA dan AMD, dan Anda tidak perlu menginstal toolkit CUDA yang berat di sistem Anda. Runtime OpenCL sudah cukup!
Sekarang Anda hanya perlu memasukkan teks tempat Anda ingin melatih model GPT Anda, di dalam dataset.txt
. Pastikan itu memiliki sejumlah kecil karakter unik! (Misalnya kumpulan data saat ini hanya menggunakan 65 karakter unik yang berbeda!)
Maka Anda harus menjalankan:
cargo run --release
Ini akan mulai melatih model dan meletakkan data pelatihan di direktori train_data
. Anda dapat menghentikan pelatihan dan melanjutkannya nanti!
Setelah berjam-jam pelatihan pada database Shakespeare, pada model parameter 300k, inilah keluarannya:
LIS:
Tore hend shater sorerds tougeng an herdofed seng he borind,
Ound ourere sthe, a sou so tousthe ashtherd, m se a man stousshan here hat mend serthe fo witownderstesther s ars at atheno sel theas,
thisth t are sorind bour win soutinds mater horengher
Ini sangat buruk, tapi melihat sisi positifnya, sepertinya ini mampu menghasilkan kata-kata yang mudah diucapkan.
Saat ini saya sedang melatih model parameter 10M untuk memeriksa lebih lanjut kebenaran penerapan saya.
PEMBARUAN 5 Juni 2023:
Ini merupakan keluaran baru, setelah berjam-jam pelatihan pada model dengan skala serupa:
What like but wore pad wo me che nogns yous dares,
As supt it nind bupart 'the reed:
And hils not es
Jelas sekali model sudah mulai mempelajari beberapa kata dan aturan tanda baca!
PEMBARUAN 9 Juni 2023:
Model mampu mencapai nilai kerugian ~1,4
Berikut ini contoh keluarannya:
Adistition gone; true; schistoes for mine souls!
Before your home, bariechts should be
Carlam on that's a worf quirer of him so.
What look'd lack away more
To him foot; one hour fortious of saves:
Son;
'Tis all Earl mmistling me.
HARSARTIO:
Why, idless of my mocks fan that were percious.
Having I will thou should and the muour ne'er shor
To purple, when duke thy are out him.
But will bid you doth remember nature.
Even OF hencomey, carniffeit, I joy
Warming; my deed, but it on of mortard,
UPDATE 10 Juni 2023, percikan AGI? TERTAWA TERBAHAK-BAHAK
Setelah melatih model 5 lapisan pada kumpulan data yang dikumpulkan dari Reddit, diberi token dengan tokenizer kalimat dengan ukuran vocab 500, selama sekitar ~10 jam pada CPU 32-inti:
Prompt "Saya pikir" :
I think it s not a scientific method of atheism
I think it s not a fairly simple possible to have a
I think that s a possible point, but I m not sure if you re not
I think that s a great idea, but I m not sure how these are you talking
I think it s a simple thing to do, but I m not sure how thes
I think that s a good thing, but I don t think that s what I mean, I think I
I think that s the only way to be able to use it to be in the same way, but
I think that the police are relatively relatively relative
I think that s a simple moron is a simple character
I think that s the only way to go to the second side of the pol
Prompt "Hidup adalah" :
Life is not a good idea, but it s not a perfection of the opp
Life is a good thing to say that I don t know what you re talking about, but I don
Life is not the same as a religion, but I m not sure if you re a
Life is a perfectly good job of arguing that you are alm
Life is a perfectly good job of the opposite of the f
Life is a fundamentalist, and then I m not sure how the h
Life is not a good idea, and it s not a perfectly good job, but I
Life is not the same as atheists, but that s the only way to be ac
Life is a bit of a single one of these industry is a f
Life is a good idea to get the opposite of the police offic
Prompt "Sedih sekali" :
So sad that you can tell you what? I think I ve been using it on the scre
So sad that I don t know about it, but I don t think I m not afraid to
So sad that I m not sure if you re not arguing with the fact that you
So sad that I was involved in the future, and I have a few we
So sad that s what I said, I m sure you are almost everything you
So sad that you can do it, and I don t think that the fact that it s a po
So sad that I m not sure if you re arguing with the fact that they are
So sad that s the one too much time, but I m not sure if you re arg
So sad that you are sadly supposed to be a big deal in the world
So sad that I don t know about this, but I m not sure how you can do it, but
PEMBARUAN 29 Juni 2023
Setelah penerapan pelatih GPU, kami dapat melatih model yang lebih besar. Berikut adalah beberapa contoh dari model 8 lapisan, 8 kepala, 128 derajat penyematan, yang dilatih pada kumpulan data TinyStories pada ukuran vocab 1000:
Once upon a time, there was a little girl named Lily.
She loved to play with her toys and she had a lot of fun.
One day, Lily saw a big chicky playing with her toys.
She asked her mom, "Can I play with her toys?" Her mom said,
"Sure, Lily. But we have to clean the pales. Let's suet some candy, Lily."
Lily nodded and went to her mom. They played with the mots and staugning her toys.
Once upon a time, there was a little girl named Lily.
She loved to play outside and explore. One day, she found a jung on the ground.
She picked it up and tecked it. She ran around and saw it. She was very sad.
She asked her mom for her mom. Her mom said, "Lily, I'm going to find it!" Lily said.
She ran to the slock and took her to the teplace. She went to the park and found a molla.
There was a boy named Tim. Tim loved to play with his toys.
One day, Tim's mom came to the park. Tim saw a big, red ball and wanted to play with it.
Tim wanted to play with the ball. Tim was very excited. He wanted to play with the ball.
But the ball was too fast. Tim wanted to play with the ball. But the ball was too fast.
Tim tried to catch it, but it was too fast. Tim was sad. He tried to run away,
but he did not want to play. Tim was sad. He did not want to play with the ball.