Unduh MEGABYTE pytorch - Unduh kode sumber MEGABYTE pytorch

MEGABYTE pytorch

Kode Sumber AI

0.3.6

Unduh

MEGABYTE - Pytorch

Implementasi MEGABYTE, Memprediksi Urutan Jutaan byte dengan Transformator Multiskala, di Pytorch. Mengambil kebebasan untuk menggeneralisasikannya lebih jauh sehingga seseorang dapat memiliki banyak model lokal.

Penelitian independen serupa yang merupakan generalisasi lebih lanjut

Apresiasi

Stabilitas dan? Huggingface atas sponsor yang murah hati untuk mengerjakan dan membuka penelitian kecerdasan buatan yang mutakhir

Memasang

$ pip install MEGABYTE-pytorch

Penggunaan

 import torch
from MEGABYTE_pytorch import MEGABYTE

model = MEGABYTE (
    num_tokens = 16000 ,             # number of tokens
    dim = ( 512 , 256 ),               # transformer model dimension (512 for coarsest, 256 for fine in this example)
    max_seq_len = ( 1024 , 4 ),        # sequence length for global and then local. this can be more than 2
    depth = ( 6 , 4 ),                 # number of layers for global and then local. this can be more than 2, but length must match the max_seq_len's
    dim_head = 64 ,                  # dimension per head
    heads = 8 ,                      # number of attention heads
    flash_attn = True               # use flash attention
)

x = torch . randint ( 0 , 16000 , ( 1 , 1024 , 4 ))

loss = model ( x , return_loss = True )
loss . backward ()

# then after much training

logits = model ( x )

# and sample from the logits accordingly
# or you can use the generate function

sampled = model . generate ( temperature = 0.9 , filter_thres = 0.9 ) # (1, 1024, 4)

Tes

Latih enwik8 tingkat karakter dengan tambalan ukuran 4 - panjang 8192

$ python train.py

Kutipan

 @misc { yu2023megabyte ,
    title   = { MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers } , 
    author  = { Lili Yu and Dániel Simig and Colin Flaherty and Armen Aghajanyan and Luke Zettlemoyer and Mike Lewis } ,
    year    = { 2023 } ,
    eprint  = { 2305.07185 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.LG }
}

 @misc { https://doi.org/10.48550/arxiv.2302.01327 ,
    doi     = { 10.48550/ARXIV.2302.01327 } ,
    url     = { https://arxiv.org/abs/2302.01327 } ,
    author  = { Kumar, Manoj and Dehghani, Mostafa and Houlsby, Neil } ,
    title   = { Dual PatchNorm } ,
    publisher = { arXiv } ,
    year    = { 2023 } ,
    copyright = { Creative Commons Attribution 4.0 International }
}

 @inproceedings { dao2022flashattention ,
    title   = { Flash{A}ttention: Fast and Memory-Efficient Exact Attention with {IO}-Awareness } ,
    author  = { Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R{'e}, Christopher } ,
    booktitle = { Advances in Neural Information Processing Systems } ,
    year    = { 2022 }
}

 @software { peng_bo_2021_5196578 ,
    author    = { PENG Bo } ,
    title     = { BlinkDL/RWKV-LM: 0.01 } ,
    month     = { aug } ,
    year      = { 2021 } ,
    publisher = { Zenodo } ,
    version   = { 0.01 } ,
    doi       = { 10.5281/zenodo.5196578 } ,
    url       = { https://doi.org/10.5281/zenodo.5196578 }
}

 @article { Kazemnejad2023TheIO ,
    title   = { The Impact of Positional Encoding on Length Generalization in Transformers } ,
    author  = { Amirhossein Kazemnejad and Inkit Padhi and Karthikeyan Natesan Ramamurthy and Payel Das and Siva Reddy } ,
    journal = { ArXiv } ,
    year    = { 2023 } ,
    volume  = { abs/2305.19466 }
}

 @misc { su2021roformer ,
    title   = { RoFormer: Enhanced Transformer with Rotary Position Embedding } ,
    author  = { Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu } ,
    year    = { 2021 } ,
    eprint  = { 2104.09864 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.CL }
}

Memperluas

Informasi Tambahan

Versi 0.3.6
Tipe Kode Sumber AI
Waktu Pembaruan 2025-01-28
ukuran 35.28MB
Berasal dari Github

Aplikasi Terkait

GitHub sgrebnov/cordova plugin background download

2024-11-05
Wa ch ull navra maza navsacha 2 2024 ull ovie Fr e Online On Strea ings

2024-11-03
Wa ch navra maza navsacha 2 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-03
pytorch image models

2024-11-03
Wa ch the greatest of all time 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-02
wolfs 2024 f llmo ie f lmyz lla dow load ree 7 0p 4 0p a d 10 0p

2024-11-01

Direkomendasikan untuk Anda

chat.petals.dev

Kode sumber lainnya

1.0.0
GPT Prompt Templates

Kode sumber lainnya

1.0.0
GPTyped

Kode sumber lainnya

GPTyped 1.0.5
node telegram bot api

Kode Sumber AI

v0.50.0
typebot.io

Kode Sumber AI

v3.1.2
python wechaty getting started

Kode Sumber AI

1.0.0
waymo open dataset

Kode sumber lainnya

December 2023 Update
wp functions

Kategori lainnya

1.0.0
termwind

Kategori lainnya

v2.3.0

Informasi Terkait Semua