magvit2 pytorch ดาวน์โหลด - magvit2 pytorch ดาวน์โหลดซอร์สโค้ด

magvit2 pytorch

โค้ดแหล่งที่มา AI

0.4.9

ดาวน์โหลด

MagViT2 - ไพทอร์ช

การใช้งาน MagViT2 จาก Language Model Beats Diffusion - Tokenizer เป็นกุญแจสำคัญในการสร้างภาพใน Pytorch ปัจจุบันนี้ถือ SOTA สำหรับการสร้าง / ทำความเข้าใจวิดีโอ

Lookup Free Quantizer ที่เสนอในบทความนี้สามารถพบได้ในพื้นที่เก็บข้อมูลแยกต่างหาก ควรมีการสำรวจด้วยวิธีอื่นๆ ทั้งหมด โดยเริ่มจากเสียง

โปรดเข้าร่วมหากคุณสนใจที่จะจำลองโทเค็นไนเซอร์ที่เสนอในบทความนี้แบบเปิดเผย

อัปเดต: Tencent ได้ใช้โค้ดในพื้นที่เก็บข้อมูลนี้และโอเพ่นซอร์สโมเดลการทำงาน

ความชื่นชม

ความเสถียร AI และ ? Huggingface สำหรับการสนับสนุนที่มีน้ำใจ เช่นเดียวกับผู้สนับสนุนอื่นๆ ของฉัน ที่ช่วยให้ฉันมีอิสระในการใช้ปัญญาประดิษฐ์แบบโอเพ่นซอร์ส
Louis Serrano สำหรับการแบ่งปันการวิ่งครั้งแรกในช่วงแรกๆ เพื่อตรวจสอบว่าสถาปัตยกรรมโดยรวมมาบรรจบกันด้วยการหาปริมาณสเกลาร์ที่มีขอบเขตจำกัด
คุณ? หากคุณเป็นวิศวกรวิจัย / นักวิทยาศาสตร์ที่มีความสามารถ อย่าลังเลที่จะมีส่วนร่วมในวิทยาศาสตร์โอเพ่นซอร์สที่ล้ำหน้า!

ติดตั้ง

$ pip install magvit2-pytorch

การใช้งาน

 from magvit2_pytorch import (
    VideoTokenizer ,
    VideoTokenizerTrainer
)

tokenizer = VideoTokenizer (
    image_size = 128 ,
    init_dim = 64 ,
    max_dim = 512 ,
    codebook_size = 1024 ,
    layers = (
        'residual' ,
        'compress_space' ,
        ( 'consecutive_residual' , 2 ),
        'compress_space' ,
        ( 'consecutive_residual' , 2 ),
        'linear_attend_space' ,
        'compress_space' ,
        ( 'consecutive_residual' , 2 ),
        'attend_space' ,
        'compress_time' ,
        ( 'consecutive_residual' , 2 ),
        'compress_time' ,
        ( 'consecutive_residual' , 2 ),
        'attend_time' ,
    )
)

trainer = VideoTokenizerTrainer (
    tokenizer ,
    dataset_folder = '/path/to/a/lot/of/media' ,     # folder of either videos or images, depending on setting below
    dataset_type = 'videos' ,                        # 'videos' or 'images', prior papers have shown pretraining on images to be effective for video synthesis
    batch_size = 4 ,
    grad_accum_every = 8 ,
    learning_rate = 2e-5 ,
    num_train_steps = 1_000_000
)

trainer . train ()

# after a lot of training ...
# can use the EMA of the tokenizer

ema_tokenizer = trainer . ema_tokenizer

# mock video

video = torch . randn ( 1 , 3 , 17 , 128 , 128 )

# tokenizing video to discrete codes

codes = ema_tokenizer . tokenize ( video ) # (1, 9, 16, 16) <- in this example, time downsampled by 4x and space downsampled by 8x. flatten token ids for (non)-autoregressive training

# sanity check

decoded_video = ema_tokenizer . decode_from_code_indices ( codes )

assert torch . allclose (
    decoded_video ,
    ema_tokenizer ( video , return_recon = True )
)

หากต้องการติดตามการทดลองของคุณเกี่ยวกับน้ำหนักและอคติ ให้ตั้งค่า use_wandb_tracking = True บน VideoTokenizerTrainer จากนั้นใช้ตัวจัดการบริบท .trackers

 trainer = VideoTokenizerTrainer (
    use_wandb_tracking = True ,
    ...
)

with trainer . trackers ( project_name = 'magvit2' , run_name = 'baseline' ):
    trainer . train ()

สิ่งที่ต้องทำ

การอ้างอิง

 @misc { yu2023language ,
    title   = { Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation } , 
    author  = { Lijun Yu and José Lezama and Nitesh B. Gundavarapu and Luca Versari and Kihyuk Sohn and David Minnen and Yong Cheng and Agrim Gupta and Xiuye Gu and Alexander G. Hauptmann and Boqing Gong and Ming-Hsuan Yang and Irfan Essa and David A. Ross and Lu Jiang } ,
    year    = { 2023 } ,
    eprint  = { 2310.05737 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.CV }
}

 @inproceedings { dao2022flashattention ,
    title   = { Flash{A}ttention: Fast and Memory-Efficient Exact Attention with {IO}-Awareness } ,
    author  = { Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R{'e}, Christopher } ,
    booktitle = { Advances in Neural Information Processing Systems } ,
    year    = { 2022 }
}

 @article { Zhang2021TokenST ,
    title   = { Token Shift Transformer for Video Classification } ,
    author  = { Hao Zhang and Y. Hao and Chong-Wah Ngo } ,
    journal = { Proceedings of the 29th ACM International Conference on Multimedia } ,
    year    = { 2021 }
}

 @inproceedings { Arora2023ZoologyMA ,
    title   = { Zoology: Measuring and Improving Recall in Efficient Language Models } ,
    author  = { Simran Arora and Sabri Eyuboglu and Aman Timalsina and Isys Johnson and Michael Poli and James Zou and Atri Rudra and Christopher R'e } ,
    year    = { 2023 } ,
    url     = { https://api.semanticscholar.org/CorpusID:266149332 }
}

ขยาย

ข้อมูลเพิ่มเติม

เวอร์ชัน 0.4.9
ประเภท โค้ดแหล่งที่มา AI
เวลาอัปเดต 2025-01-17
ขนาด 1.73MB
มาจาก Github

แอปที่เกี่ยวข้อง

GitHub sgrebnov/cordova plugin background download

2024-11-05
Wa ch ull navra maza navsacha 2 2024 ull ovie Fr e Online On Strea ings

2024-11-03
Wa ch navra maza navsacha 2 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-03
pytorch image models

2024-11-03
Wa ch the greatest of all time 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-02
wolfs 2024 f llmo ie f lmyz lla dow load ree 7 0p 4 0p a d 10 0p

2024-11-01

แนะนำสำหรับคุณ

chat.petals.dev

ซอร์สโค้ดอื่น ๆ

1.0.0
GPT Prompt Templates

ซอร์สโค้ดอื่น ๆ

1.0.0
GPTyped

ซอร์สโค้ดอื่น ๆ

GPTyped 1.0.5
node telegram bot api

โค้ดแหล่งที่มา AI

v0.50.0
typebot.io

โค้ดแหล่งที่มา AI

v3.1.2
python wechaty getting started

โค้ดแหล่งที่มา AI

1.0.0
waymo open dataset

ซอร์สโค้ดอื่น ๆ

December 2023 Update
termwind

หมวดหมู่อื่นๆ

v2.3.0
wp functions

หมวดหมู่อื่นๆ

1.0.0

ข้อมูลที่เกี่ยวข้อง ทั้งหมด