Descarga de magvit2 pytorch - Descarga del código fuente magvit2 pytorch

magvit2 pytorch

Código Fuente de IA

0.4.9

Descargar

MagViT2-Pytorch

La implementación de MagViT2 desde el modelo de lenguaje supera la difusión: Tokenizer es clave para la generación visual en Pytorch. Actualmente contiene SOTA para la generación/comprensión de vídeo.

El Lookup Free Quantizer propuesto en el artículo se puede encontrar en un repositorio separado. Probablemente debería explorarse para todas las demás modalidades, empezando por el audio.

Únase si está interesado en replicar abiertamente el tokenizador propuesto en este documento.

Actualización: Tencent ha utilizado el código de este repositorio y ha abierto un modelo funcional.

Apreciación

EstabilidadAI y ? Huggingface por el generoso patrocinio, así como a mis otros patrocinadores, por brindarme la independencia para abrir la inteligencia artificial de código abierto.
Louis Serrano por compartir algunas ejecuciones iniciales, validando que la arquitectura general converge con la cuantificación escalar finita.
¿Tú? Si es un ingeniero/científico de investigación talentoso, ¡siéntase libre de contribuir a la ciencia de código abierto de vanguardia!

Instalar

$ pip install magvit2-pytorch

Uso

 from magvit2_pytorch import (
    VideoTokenizer ,
    VideoTokenizerTrainer
)

tokenizer = VideoTokenizer (
    image_size = 128 ,
    init_dim = 64 ,
    max_dim = 512 ,
    codebook_size = 1024 ,
    layers = (
        'residual' ,
        'compress_space' ,
        ( 'consecutive_residual' , 2 ),
        'compress_space' ,
        ( 'consecutive_residual' , 2 ),
        'linear_attend_space' ,
        'compress_space' ,
        ( 'consecutive_residual' , 2 ),
        'attend_space' ,
        'compress_time' ,
        ( 'consecutive_residual' , 2 ),
        'compress_time' ,
        ( 'consecutive_residual' , 2 ),
        'attend_time' ,
    )
)

trainer = VideoTokenizerTrainer (
    tokenizer ,
    dataset_folder = '/path/to/a/lot/of/media' ,     # folder of either videos or images, depending on setting below
    dataset_type = 'videos' ,                        # 'videos' or 'images', prior papers have shown pretraining on images to be effective for video synthesis
    batch_size = 4 ,
    grad_accum_every = 8 ,
    learning_rate = 2e-5 ,
    num_train_steps = 1_000_000
)

trainer . train ()

# after a lot of training ...
# can use the EMA of the tokenizer

ema_tokenizer = trainer . ema_tokenizer

# mock video

video = torch . randn ( 1 , 3 , 17 , 128 , 128 )

# tokenizing video to discrete codes

codes = ema_tokenizer . tokenize ( video ) # (1, 9, 16, 16) <- in this example, time downsampled by 4x and space downsampled by 8x. flatten token ids for (non)-autoregressive training

# sanity check

decoded_video = ema_tokenizer . decode_from_code_indices ( codes )

assert torch . allclose (
    decoded_video ,
    ema_tokenizer ( video , return_recon = True )
)

Para realizar un seguimiento de sus experimentos en Pesos y sesgos, establezca use_wandb_tracking = True en VideoTokenizerTrainer y luego use el administrador de contexto .trackers

 trainer = VideoTokenizerTrainer (
    use_wandb_tracking = True ,
    ...
)

with trainer . trackers ( project_name = 'magvit2' , run_name = 'baseline' ):
    trainer . train ()

Hacer

Citas

 @misc { yu2023language ,
    title   = { Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation } , 
    author  = { Lijun Yu and José Lezama and Nitesh B. Gundavarapu and Luca Versari and Kihyuk Sohn and David Minnen and Yong Cheng and Agrim Gupta and Xiuye Gu and Alexander G. Hauptmann and Boqing Gong and Ming-Hsuan Yang and Irfan Essa and David A. Ross and Lu Jiang } ,
    year    = { 2023 } ,
    eprint  = { 2310.05737 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.CV }
}

 @inproceedings { dao2022flashattention ,
    title   = { Flash{A}ttention: Fast and Memory-Efficient Exact Attention with {IO}-Awareness } ,
    author  = { Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R{'e}, Christopher } ,
    booktitle = { Advances in Neural Information Processing Systems } ,
    year    = { 2022 }
}

 @article { Zhang2021TokenST ,
    title   = { Token Shift Transformer for Video Classification } ,
    author  = { Hao Zhang and Y. Hao and Chong-Wah Ngo } ,
    journal = { Proceedings of the 29th ACM International Conference on Multimedia } ,
    year    = { 2021 }
}

 @inproceedings { Arora2023ZoologyMA ,
    title   = { Zoology: Measuring and Improving Recall in Efficient Language Models } ,
    author  = { Simran Arora and Sabri Eyuboglu and Aman Timalsina and Isys Johnson and Michael Poli and James Zou and Atri Rudra and Christopher R'e } ,
    year    = { 2023 } ,
    url     = { https://api.semanticscholar.org/CorpusID:266149332 }
}

Expandir

Información adicional

Versión 0.4.9
Tipo Código Fuente de IA
Fecha de actualización 2025-01-17
tamaño 1.73MB
Proviene de Github

Aplicaciones relacionadas

GitHub sgrebnov/cordova plugin background download

2024-11-05
Wa ch ull navra maza navsacha 2 2024 ull ovie Fr e Online On Strea ings

2024-11-03
Wa ch navra maza navsacha 2 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-03
pytorch image models

2024-11-03
Wa ch the greatest of all time 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-02
wolfs 2024 f llmo ie f lmyz lla dow load ree 7 0p 4 0p a d 10 0p

2024-11-01

Recomendado para ti

chat.petals.dev

Otro código fuente

1.0.0
GPT Prompt Templates

Otro código fuente

1.0.0
GPTyped

Otro código fuente

GPTyped 1.0.5
node telegram bot api

Código Fuente de IA

v0.50.0
typebot.io

Código Fuente de IA

v3.1.2
python wechaty getting started

Código Fuente de IA

1.0.0
waymo open dataset

Otro código fuente

December 2023 Update
termwind

Otras categorias

v2.3.0
wp functions

Otras categorias

1.0.0

Información relacionada Todo