Youku mPLUG下載 - Youku mPLUG源碼下載

Youku mPLUG

其他源碼

1.0.0

下載

Youku-mPLUG 10M中文大規模影片文字資料集

Youku-mPLUG：千萬級大規模中文視訊語言預訓練資料集和基準測試下載連結在此

紙

youku-mplug 的範例

什麼是優酷mPLUG？

我們發布了公開的最大的中文高品質視訊語言資料集（1000萬），名為Youku-mPLUG ，該資料集來自中國知名影片分享網站優酷，具有嚴格的安全性、多樣性和品質標準。

youku-mplug 的範例

建議的 Youku-mPLUG 資料集中的影片剪輯和標題範例。

我們提供 3 個不同的下游多模態影片基準資料集來衡量預訓練模型的能力。這 3 項不同的任務包括：

影片類別預測：給定一個影片及其對應的標題，預測影片的類別。
視訊文字檢索：在存在一些影片和一些文字的情況下，使用影片進行文字檢索，使用文字進行影片檢索。
視訊字幕：在有影片的情況下，描述影片的內容。

youku-mplug下游資料集範例

數據統計

該資料集共包含1000萬個視頻，視頻品質高，分佈在20個超級類別到45個類別。

統計數據

Youku-mPLUG資料集中的類別分佈。

零射擊能力

案例1 案例2

下載

您可以透過此連結下載所有影片和註釋文件

設定

注意：由於megatron_util的bug，安裝megatron_util後，需要將conda/envs/youku/lib/python3.10/site-packages/megatron_util/initialize.py替換為目前目錄下的initialize.py 。

 conda env create -f environment.yml
conda activate youku
pip install megatron_util==1.3.0 -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html

# For caption evaluation
apt-get install default-jre

mPLUG-視訊 (1.3B / 2.7B)

預訓練

首先，您應該從 Modelscope 下載 GPT-3 1.3B 和 2.7B 檢查點。預訓練模型可以在此處（1.3B）和此處（2.7B）下載。

將 mPLUG-Video 的預訓練運行為：

 exp_name = 'pretrain/gpt3_1.3B/pretrain_gpt3_freezeGPT_youku_v0'
PYTHONPATH = $ PYTHONPATH :. / 
python - m torch . distributed . launch - - nproc_per_node = 8 - - master_addr = $ MASTER_ADDR 
  - - master_port = $ MASTER_PORT 
  - - nnodes = $ WORLD_SIZE 
  - - node_rank = $ RANK 
  - - use_env run_pretrain_distributed_gpt3 . py 
  - - config . / configs / ${ exp_name }. yaml 
  - - output_dir . / output / ${ exp_name } 
  - - enable_deepspeed 
  - - bf16
  2 > & 1 | tee . / output / ${ exp_name } / train . log

標竿管理

進行下游微調。我們以視訊類別預測為例：

 exp_name = 'cls/cls_gpt3_1.3B_youku_v0_sharp_2'
PYTHONPATH = $ PYTHONPATH :. / 
python - m torch . distributed . launch - - nproc_per_node = 8 - - master_addr = $ MASTER_ADDR 
  - - master_port = $ MASTER_PORT 
  - - nnodes = $ WORLD_SIZE 
  - - node_rank = $ RANK 
  - - use_env downstream / run_cls_distributed_gpt3 . py 
  - - config . / configs / ${ exp_name }. yaml 
  - - output_dir . / output / ${ exp_name } 
  - - enable_deepspeed 
  - - resume path / to / 1_3 B_mp_rank_00_model_states . pt 
  - - bf16
  2 > & 1 | tee . / output / ${ exp_name } / train . log

實驗結果

下面我們展示驗證集上的結果以供參考。

驗證集上的影片類別預測結果。驗證集上的視訊檢索結果。

mPLUG-視訊 (BloomZ-7B)

我們基於 mPLUG-Owl 建構了 mPLUG-Video 模型。要使用該模型，您應該先將 mPLUG-Owl 儲存庫克隆為

git clone https://github.com/X-PLUG/mPLUG-Owl.git
cd mPLUG-Owl/mPLUG-Owl

HuggingFace 上提供了指令調整的檢查點。對於模型的微調，可以參考 mPLUG-Owl Repo。要執行視訊推理，您可以使用以下程式碼：

 import torch
from mplug_owl_video . modeling_mplug_owl import MplugOwlForConditionalGeneration
from transformers import AutoTokenizer
from mplug_owl_video . processing_mplug_owl import MplugOwlImageProcessor , MplugOwlProcessor

pretrained_ckpt = 'MAGAer13/mplug-youku-bloomz-7b'
model = MplugOwlForConditionalGeneration . from_pretrained (
    pretrained_ckpt ,
    torch_dtype = torch . bfloat16 ,
    device_map = { '' : 0 },
)
image_processor = MplugOwlImageProcessor . from_pretrained ( pretrained_ckpt )
tokenizer = AutoTokenizer . from_pretrained ( pretrained_ckpt )
processor = MplugOwlProcessor ( image_processor , tokenizer )

# We use a human/AI template to organize the context as a multi-turn conversation.
# <|video|> denotes an video placehold.
prompts = [
'''The following is a conversation between a curious human and AI assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
Human: <|video|>
Human: 视频中的女人在干什么？
AI: ''' ]

video_list = [ 'yoga.mp4' ]

# generate kwargs (the same in transformers) can be passed in the do_generate()
generate_kwargs = {
    'do_sample' : True ,
    'top_k' : 5 ,
    'max_length' : 512
}
inputs = processor ( text = prompts , videos = video_list , num_frames = 4 , return_tensors = 'pt' )
inputs = { k : v . bfloat16 () if v . dtype == torch . float else v for k , v in inputs . items ()}
inputs = { k : v . to ( model . device ) for k , v in inputs . items ()}
with torch . no_grad ():
    res = model . generate ( ** inputs , ** generate_kwargs )
sentence = tokenizer . decode ( res . tolist ()[ 0 ], skip_special_tokens = True )
print ( sentence )

引用優酷mPLUG

如果您發現該資料集對您的研究有用，請考慮引用我們的論文。

 @misc { xu2023youku_mplug ,
    title = { Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks } ,
    author = { Haiyang Xu, Qinghao Ye, Xuan Wu, Ming Yan, Yuan Miao, Jiabo Ye, Guohai Xu, Anwen Hu, Yaya Shi, Chenliang Li, Qi Qian, Que Maofei, Ji Zhang, Xiao Zeng, Fei Huang } ,
    year = { 2023 } ,
    eprint = { 2306.04362 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.CL }
}

展開

附加信息

版本 1.0.0
類型其他源碼
更新時間 2024-12-13
大小 15.45MB
來自於 Github

相關應用

GitHub sgrebnov/cordova plugin background download

2024-11-05
Wa ch ull navra maza navsacha 2 2024 ull ovie Fr e Online On Strea ings

2024-11-03
Wa ch navra maza navsacha 2 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-03
Wa ch the greatest of all time 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-02
wolfs 2024 f llmo ie f lmyz lla dow load ree 7 0p 4 0p a d 10 0p

2024-11-01
GitHub actions/download artifact

2024-11-01

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
waymo open dataset

其他源碼

December 2023 Update
SmartTube

其他源碼

24.71 Stable
Sunamu

其他源碼

Release 2.2.0
waymo open dataset

其他源碼

December 2023 Update
termwind

其他類別

v2.3.0
wp functions

其他類別

1.0.0

相關資訊全部