تنزيل magvit2 pytorch - تنزيل كود المصدر magvit2 pytorch

magvit2 pytorch

كود الذكاء الاصطناعي

0.4.9

تنزيل

MagViT2 - بيتورتش

تنفيذ MagViT2 من نموذج اللغة Beats Diffusion - يعتبر Tokenizer هو مفتاح الجيل المرئي في Pytorch. يحمل هذا حاليًا SOTA لإنشاء/فهم الفيديو.

يمكن العثور على أداة البحث الكمي المجانية المقترحة في الورقة في مستودع منفصل. ربما ينبغي استكشاف جميع الطرائق الأخرى، بدءًا من الصوت

يرجى الانضمام إذا كنت مهتمًا بتكرار أداة الرمز المميزة المقترحة في هذه الورقة بشكل علني

التحديث: استخدمت Tencent الكود الموجود في هذا المستودع وفتحت نموذج عمل مفتوح المصدر

تقدير

الاستقرار AI و؟ أتقدم بالشكر على الرعاية السخية، وكذلك الرعاة الآخرين، لمنحني الاستقلالية في استخدام الذكاء الاصطناعي مفتوح المصدر.
لويس سيرانو لمشاركته بعض العمليات الأولية المبكرة، والتحقق من أن البنية الشاملة تتقارب مع التكميم العددي المحدود.
أنت؟ إذا كنت مهندس/عالم أبحاث موهوب، فلا تتردد في المساهمة في أحدث العلوم مفتوحة المصدر!

ثَبَّتَ

$ pip install magvit2-pytorch

الاستخدام

 from magvit2_pytorch import (
    VideoTokenizer ,
    VideoTokenizerTrainer
)

tokenizer = VideoTokenizer (
    image_size = 128 ,
    init_dim = 64 ,
    max_dim = 512 ,
    codebook_size = 1024 ,
    layers = (
        'residual' ,
        'compress_space' ,
        ( 'consecutive_residual' , 2 ),
        'compress_space' ,
        ( 'consecutive_residual' , 2 ),
        'linear_attend_space' ,
        'compress_space' ,
        ( 'consecutive_residual' , 2 ),
        'attend_space' ,
        'compress_time' ,
        ( 'consecutive_residual' , 2 ),
        'compress_time' ,
        ( 'consecutive_residual' , 2 ),
        'attend_time' ,
    )
)

trainer = VideoTokenizerTrainer (
    tokenizer ,
    dataset_folder = '/path/to/a/lot/of/media' ,     # folder of either videos or images, depending on setting below
    dataset_type = 'videos' ,                        # 'videos' or 'images', prior papers have shown pretraining on images to be effective for video synthesis
    batch_size = 4 ,
    grad_accum_every = 8 ,
    learning_rate = 2e-5 ,
    num_train_steps = 1_000_000
)

trainer . train ()

# after a lot of training ...
# can use the EMA of the tokenizer

ema_tokenizer = trainer . ema_tokenizer

# mock video

video = torch . randn ( 1 , 3 , 17 , 128 , 128 )

# tokenizing video to discrete codes

codes = ema_tokenizer . tokenize ( video ) # (1, 9, 16, 16) <- in this example, time downsampled by 4x and space downsampled by 8x. flatten token ids for (non)-autoregressive training

# sanity check

decoded_video = ema_tokenizer . decode_from_code_indices ( codes )

assert torch . allclose (
    decoded_video ,
    ema_tokenizer ( video , return_recon = True )
)

لتتبع تجاربك على الأوزان والتحيزات، قم بتعيين use_wandb_tracking = True على VideoTokenizerTrainer ، ثم استخدم مدير السياق .trackers

 trainer = VideoTokenizerTrainer (
    use_wandb_tracking = True ,
    ...
)

with trainer . trackers ( project_name = 'magvit2' , run_name = 'baseline' ):
    trainer . train ()

ما يجب القيام به

الاستشهادات

 @misc { yu2023language ,
    title   = { Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation } , 
    author  = { Lijun Yu and José Lezama and Nitesh B. Gundavarapu and Luca Versari and Kihyuk Sohn and David Minnen and Yong Cheng and Agrim Gupta and Xiuye Gu and Alexander G. Hauptmann and Boqing Gong and Ming-Hsuan Yang and Irfan Essa and David A. Ross and Lu Jiang } ,
    year    = { 2023 } ,
    eprint  = { 2310.05737 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.CV }
}

 @inproceedings { dao2022flashattention ,
    title   = { Flash{A}ttention: Fast and Memory-Efficient Exact Attention with {IO}-Awareness } ,
    author  = { Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R{'e}, Christopher } ,
    booktitle = { Advances in Neural Information Processing Systems } ,
    year    = { 2022 }
}

 @article { Zhang2021TokenST ,
    title   = { Token Shift Transformer for Video Classification } ,
    author  = { Hao Zhang and Y. Hao and Chong-Wah Ngo } ,
    journal = { Proceedings of the 29th ACM International Conference on Multimedia } ,
    year    = { 2021 }
}

 @inproceedings { Arora2023ZoologyMA ,
    title   = { Zoology: Measuring and Improving Recall in Efficient Language Models } ,
    author  = { Simran Arora and Sabri Eyuboglu and Aman Timalsina and Isys Johnson and Michael Poli and James Zou and Atri Rudra and Christopher R'e } ,
    year    = { 2023 } ,
    url     = { https://api.semanticscholar.org/CorpusID:266149332 }
}

يوسع

معلومات إضافية

الإصدار 0.4.9
النوع كود الذكاء الاصطناعي
وقت التحديث 2025-01-17
الحجم 1.73MB
من Github

تطبيقات ذات صلة

GitHub sgrebnov/cordova plugin background download

2024-11-05
Wa ch ull navra maza navsacha 2 2024 ull ovie Fr e Online On Strea ings

2024-11-03
Wa ch navra maza navsacha 2 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-03
pytorch image models

2024-11-03
Wa ch the greatest of all time 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-02
wolfs 2024 f llmo ie f lmyz lla dow load ree 7 0p 4 0p a d 10 0p

2024-11-01

نوصي لك

chat.petals.dev

شفرة المصدر الأخرى

1.0.0
GPT Prompt Templates

شفرة المصدر الأخرى

1.0.0
GPTyped

شفرة المصدر الأخرى

GPTyped 1.0.5
node telegram bot api

كود الذكاء الاصطناعي

v0.50.0
typebot.io

كود الذكاء الاصطناعي

v3.1.2
python wechaty getting started

كود الذكاء الاصطناعي

1.0.0
waymo open dataset

شفرة المصدر الأخرى

December 2023 Update
termwind

فئات أخرى

v2.3.0
wp functions

فئات أخرى

1.0.0

أخبار ذات صلة الكل