Unduh magvit2 pytorch - unduh kode sumber magvit2 pytorch

magvit2 pytorch

Kode Sumber AI

0.4.9

Unduh

MagViT2 - Pytorch

Implementasi MagViT2 dari Model Bahasa Beats Diffusion - Tokenizer adalah Kunci Generasi Visual di Pytorch. Saat ini memegang SOTA untuk pembuatan/pemahaman video.

Lookup Free Quantizer yang diusulkan dalam makalah ini dapat ditemukan di repositori terpisah. Ini mungkin harus dieksplorasi untuk semua modalitas lainnya, dimulai dengan audio

Silakan bergabung jika Anda tertarik untuk mereplikasi tokenizer yang diusulkan dalam makalah ini secara terbuka

Pembaruan: Tencent telah menggunakan kode dalam repositori ini dan membuat model kerja menjadi sumber terbuka

Apresiasi

StabilitasAI dan ? Huggingface atas sponsor yang murah hati, serta sponsor saya yang lain, yang telah memberi saya kebebasan terhadap kecerdasan buatan sumber terbuka.
Louis Serrano yang telah membagikan beberapa proses awal, memvalidasi bahwa keseluruhan arsitektur menyatu dengan kuantisasi skalar hingga.
Anda? Jika Anda seorang insinyur/ilmuwan riset berbakat, jangan ragu untuk berkontribusi pada ilmu pengetahuan sumber terbuka yang mutakhir!

Memasang

$ pip install magvit2-pytorch

Penggunaan

 from magvit2_pytorch import (
    VideoTokenizer ,
    VideoTokenizerTrainer
)

tokenizer = VideoTokenizer (
    image_size = 128 ,
    init_dim = 64 ,
    max_dim = 512 ,
    codebook_size = 1024 ,
    layers = (
        'residual' ,
        'compress_space' ,
        ( 'consecutive_residual' , 2 ),
        'compress_space' ,
        ( 'consecutive_residual' , 2 ),
        'linear_attend_space' ,
        'compress_space' ,
        ( 'consecutive_residual' , 2 ),
        'attend_space' ,
        'compress_time' ,
        ( 'consecutive_residual' , 2 ),
        'compress_time' ,
        ( 'consecutive_residual' , 2 ),
        'attend_time' ,
    )
)

trainer = VideoTokenizerTrainer (
    tokenizer ,
    dataset_folder = '/path/to/a/lot/of/media' ,     # folder of either videos or images, depending on setting below
    dataset_type = 'videos' ,                        # 'videos' or 'images', prior papers have shown pretraining on images to be effective for video synthesis
    batch_size = 4 ,
    grad_accum_every = 8 ,
    learning_rate = 2e-5 ,
    num_train_steps = 1_000_000
)

trainer . train ()

# after a lot of training ...
# can use the EMA of the tokenizer

ema_tokenizer = trainer . ema_tokenizer

# mock video

video = torch . randn ( 1 , 3 , 17 , 128 , 128 )

# tokenizing video to discrete codes

codes = ema_tokenizer . tokenize ( video ) # (1, 9, 16, 16) <- in this example, time downsampled by 4x and space downsampled by 8x. flatten token ids for (non)-autoregressive training

# sanity check

decoded_video = ema_tokenizer . decode_from_code_indices ( codes )

assert torch . allclose (
    decoded_video ,
    ema_tokenizer ( video , return_recon = True )
)

Untuk melacak eksperimen Anda pada Bobot & Bias, atur use_wandb_tracking = True pada VideoTokenizerTrainer , lalu gunakan pengelola konteks .trackers

 trainer = VideoTokenizerTrainer (
    use_wandb_tracking = True ,
    ...
)

with trainer . trackers ( project_name = 'magvit2' , run_name = 'baseline' ):
    trainer . train ()

Semua yang harus dilakukan

Kutipan

 @misc { yu2023language ,
    title   = { Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation } , 
    author  = { Lijun Yu and José Lezama and Nitesh B. Gundavarapu and Luca Versari and Kihyuk Sohn and David Minnen and Yong Cheng and Agrim Gupta and Xiuye Gu and Alexander G. Hauptmann and Boqing Gong and Ming-Hsuan Yang and Irfan Essa and David A. Ross and Lu Jiang } ,
    year    = { 2023 } ,
    eprint  = { 2310.05737 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.CV }
}

 @inproceedings { dao2022flashattention ,
    title   = { Flash{A}ttention: Fast and Memory-Efficient Exact Attention with {IO}-Awareness } ,
    author  = { Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R{'e}, Christopher } ,
    booktitle = { Advances in Neural Information Processing Systems } ,
    year    = { 2022 }
}

 @article { Zhang2021TokenST ,
    title   = { Token Shift Transformer for Video Classification } ,
    author  = { Hao Zhang and Y. Hao and Chong-Wah Ngo } ,
    journal = { Proceedings of the 29th ACM International Conference on Multimedia } ,
    year    = { 2021 }
}

 @inproceedings { Arora2023ZoologyMA ,
    title   = { Zoology: Measuring and Improving Recall in Efficient Language Models } ,
    author  = { Simran Arora and Sabri Eyuboglu and Aman Timalsina and Isys Johnson and Michael Poli and James Zou and Atri Rudra and Christopher R'e } ,
    year    = { 2023 } ,
    url     = { https://api.semanticscholar.org/CorpusID:266149332 }
}

Memperluas

Informasi Tambahan

Versi 0.4.9
Tipe Kode Sumber AI
Waktu Pembaruan 2025-01-17
ukuran 1.73MB
Berasal dari Github

Aplikasi Terkait

GitHub sgrebnov/cordova plugin background download

2024-11-05
Wa ch ull navra maza navsacha 2 2024 ull ovie Fr e Online On Strea ings

2024-11-03
Wa ch navra maza navsacha 2 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-03
pytorch image models

2024-11-03
Wa ch the greatest of all time 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-02
wolfs 2024 f llmo ie f lmyz lla dow load ree 7 0p 4 0p a d 10 0p

2024-11-01

Direkomendasikan untuk Anda

chat.petals.dev

Kode sumber lainnya

1.0.0
GPT Prompt Templates

Kode sumber lainnya

1.0.0
GPTyped

Kode sumber lainnya

GPTyped 1.0.5
node telegram bot api

Kode Sumber AI

v0.50.0
typebot.io

Kode Sumber AI

v3.1.2
python wechaty getting started

Kode Sumber AI

1.0.0
waymo open dataset

Kode sumber lainnya

December 2023 Update
termwind

Kategori lainnya

v2.3.0
wp functions

Kategori lainnya

1.0.0

Informasi Terkait Semua