nGPT pytorch Download - nGPT pytorch -Quellcode-Download

nGPT pytorch

AI-Quellcode

0.2.7

Herunterladen

nGPT (normalisiertes GPT) – Pytorch

Schnelle Implementierung von nGPT, vollständiges Lernen auf der Hypersphäre, von NvidiaAI. Die Frage ist, ob es einen Verlust an Ausdruckskraft gibt, den sie unter den Teppich gekehrt haben, aber ich nehme es mit gutem Glauben an.

Diese Art von Netzwerk sollte auch im Kontext des kontinuierlichen Lernens und des Verlusts der Plastizität untersucht werden

Die Anpassung an Vision-Transformatoren ist da

Installieren

$ pip install nGPT-pytorch

Verwendung

 import torch
from nGPT_pytorch import nGPT

model = nGPT (
    num_tokens = 256 ,
    dim = 512 ,
    depth = 4 ,
    attn_norm_qk = True
)

x = torch . randint ( 0 , 256 , ( 2 , 2048 ))

loss = model ( x , return_loss = True )
loss . backward ()

logits = model ( x ) # (2, 2048, 256)

Prüfen

Enwik8

$ python train.py

Zitate

 @inproceedings { Loshchilov2024nGPTNT ,
    title   = { nGPT: Normalized Transformer with Representation Learning on the Hypersphere } ,
    author  = { Ilya Loshchilov and Cheng-Ping Hsieh and Simeng Sun and Boris Ginsburg } ,
    year    = { 2024 } ,
    url     = { https://api.semanticscholar.org/CorpusID:273026160 }
}

 @article { Luo2017CosineNU ,
    title     = { Cosine Normalization: Using Cosine Similarity Instead of Dot Product in Neural Networks } ,
    author    = { Chunjie Luo and Jianfeng Zhan and Lei Wang and Qiang Yang } ,
    journal   = { ArXiv } ,
    year      = { 2017 } ,
    volume    = { abs/1702.05870 } ,
    url       = { https://api.semanticscholar.org/CorpusID:1505432 }
}

 @inproceedings { Zhou2024ValueRL ,
    title   = { Value Residual Learning For Alleviating Attention Concentration In Transformers } ,
    author  = { Zhanchao Zhou and Tianyi Wu and Zhiyun Jiang and Zhenzhong Lan } ,
    year    = { 2024 } ,
    url     = { https://api.semanticscholar.org/CorpusID:273532030 }
}