kan gpt
1.2.0
使用 Kolmogorov-Arnold 網路 (KAN) 進行語言建模的生成式預訓練 Transformer (GPT) 的 PyTorch 實現
pip install kan_gpt
如果您發現我們的工作有用,請引用我們!
@misc{GANESH2024KANGPT,
author = {Aditya Nalgunda Ganesh},
title = {KAN-GPT: The PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold Networks (KANs) for language modeling},
year = {2024},
month = {May},
note = {Release 1.0.0, 9th May 2024},
url = {https://github.com/AdityaNG/kan-gpt/}
}
請參閱 KAN_GPT.ipynb 和 kan_gpt/prompt.py 以了解使用範例。以下是如何使用該模型的概述:
from kan_gpt . model import GPT
from transformers import GPT2Tokenizer
model_config = GPT . get_default_config ()
model_config . model_type = "gpt2"
model_config . vocab_size = 50257
model_config . block_size = 1024
model = GPT ( model_config )
tokenizer = GPT2Tokenizer . from_pretrained ( 'gpt2' )
prompt = "Bangalore is often described as the "
prompt_encoded = tokenizer . encode (
text = prompt , add_special_tokens = False
)
x = torch . tensor ( prompt_encoded ). unsqueeze ( 0 )
model . eval ()
y = model . generate ( x , 50 ) # sample 50 tokens
result = tokenizer . decode ( y [ 0 ])
print ( result )
# Bangalore is often described as the Silicon Valley of India.
# The city has witnessed rapid growth in the past two decades.....
# Download Repo
git clone https://github.com/AdityaNG/kan-gpt
cd kan-gpt
git pull
# Download Dataset
python3 -m kan_gpt.download_dataset --dataset tinyshakespeare
python3 -m kan_gpt.download_dataset --dataset mnist
python3 -m kan_gpt.download_dataset --dataset webtext
# Install dependencies for development
pip install -r requirements.txt
pip install -e .
使用以下虛擬腳本確保一切按預期工作
WANDB_MODE=offline CUDA_VISIBLE_DEVICE= " " python3 -m kan_gpt.train --architecture MLP --batch_size 1 --dummy_dataset --device cpu --max_iters 200
WANDB_MODE=offline CUDA_VISIBLE_DEVICE= " " python3 -m kan_gpt.train --architecture KAN --batch_size 1 --dummy_dataset --device cpu --max_iters 200
然後使用訓練腳本
python -m kan_gpt.train
您可以提示模型生成文本,如下所示
python -m kan_gpt.prompt --prompt " Bangalore is often described as the " --model_path (checkpoint)
我們在 Tiny Shakespeare 資料集上訓練 KAN-GPT 與等效的 MLP-GPT 模型並將其進行比較。我們觀察到 KAN-GPT 的表現略優於 MLP-GPT。我們正在研究進一步的實驗以進行更深入的研究。結果如下圖所示:
指標 | ||
---|---|---|
KAN.train_kan
的 KAN 訓練邏輯mkdocs gh-deploy
閱讀 CONTRIBUTING.md 檔案。