Youku mPLUGダウンロード - Youku mPLUGソースコードのダウンロード

Youku mPLUG

その他のソースコード

1.0.0

ダウンロード

Youku-mPLUG 10M 中国語の大規模ビデオテキストデータセット

Youku-mPLUG: 1,000 万の大規模な中国語ビデオ言語の事前トレーニングデータセットとベンチマークのダウンロードリンクはこちら

紙

youku-mplug の例

Youku-mPLUGとは何ですか?

私たちは、 Youku-mPLUGという名前の最大の中国語高品質ビデオ言語データセット (1,000 万件) を公開しています。これは、安全性、多様性、品質の厳格な基準に従って、Youku という有名な中国のビデオ共有 Web サイトから収集されています。

youku-mplug の例

提案されている Youku-mPLUG データセット内のビデオクリップとタイトルの例。

事前トレーニングされたモデルの機能を測定するために、3 つの異なるダウンストリームマルチモーダルビデオベンチマークデータセットを提供します。 3 つの異なるタスクには次のものが含まれます。

ビデオカテゴリ予測：ビデオとそれに対応するタイトルを指定して、ビデオのカテゴリを予測します。
ビデオ-テキスト検索：ビデオとテキストが存在する場合、テキスト検索にはビデオを使用し、ビデオ検索にはテキストを使用します。
ビデオキャプション：ビデオがある場合、ビデオの内容を説明します。

youku-mplug ダウンストリームデータセットの例

データ統計

データセットには合計 1,000 万本のビデオが含まれており、これらのビデオは高品質で、20 のスーパーカテゴリから 45 のカテゴリに分散されています。

Youku-mPLUG データセット内のカテゴリの分布。

ゼロショット機能

ケース1 ケース2

ダウンロード

このリンクからすべてのビデオと注釈ファイルをダウンロードできます

設定

注: megatron_util のバグのため、megatron_util をインストールした後、 conda/envs/youku/lib/python3.10/site-packages/megatron_util/initialize.pyを現在のディレクトリのinitialize.pyに置き換える必要があります。

 conda env create -f environment.yml
conda activate youku
pip install megatron_util==1.3.0 -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html

# For caption evaluation
apt-get install default-jre

mPLUG-ビデオ (1.3B / 2.7B)

事前トレーニング

まず、Modelscope から GPT-3 1.3B および 2.7B チェックポイントをダウンロードする必要があります。事前トレーニングされたモデルは、ここ (1.3B) とここ (2.7B) からダウンロードできます。

mPLUG-Video の事前トレーニングを次のように実行します。

 exp_name = 'pretrain/gpt3_1.3B/pretrain_gpt3_freezeGPT_youku_v0'
PYTHONPATH = $ PYTHONPATH :. / 
python - m torch . distributed . launch - - nproc_per_node = 8 - - master_addr = $ MASTER_ADDR 
  - - master_port = $ MASTER_PORT 
  - - nnodes = $ WORLD_SIZE 
  - - node_rank = $ RANK 
  - - use_env run_pretrain_distributed_gpt3 . py 
  - - config . / configs / ${ exp_name }. yaml 
  - - output_dir . / output / ${ exp_name } 
  - - enable_deepspeed 
  - - bf16
  2 > & 1 | tee . / output / ${ exp_name } / train . log

ベンチマーク

ダウンストリームの微調整を実行します。例としてビデオカテゴリ予測を取り上げます。

 exp_name = 'cls/cls_gpt3_1.3B_youku_v0_sharp_2'
PYTHONPATH = $ PYTHONPATH :. / 
python - m torch . distributed . launch - - nproc_per_node = 8 - - master_addr = $ MASTER_ADDR 
  - - master_port = $ MASTER_PORT 
  - - nnodes = $ WORLD_SIZE 
  - - node_rank = $ RANK 
  - - use_env downstream / run_cls_distributed_gpt3 . py 
  - - config . / configs / ${ exp_name }. yaml 
  - - output_dir . / output / ${ exp_name } 
  - - enable_deepspeed 
  - - resume path / to / 1_3 B_mp_rank_00_model_states . pt 
  - - bf16
  2 > & 1 | tee . / output / ${ exp_name } / train . log

実験結果

以下に、参考として検証セットの結果を示します。

検証セットのビデオカテゴリ予測結果。検証セットのビデオ取得結果。

mPLUG-ビデオ (BloomZ-7B)

mPLUG-Owl に基づいて mPLUG-Video モデルを構築します。モデルを使用するには、まず mPLUG-Owl リポジトリを次のように複製する必要があります。

git clone https://github.com/X-PLUG/mPLUG-Owl.git
cd mPLUG-Owl/mPLUG-Owl

命令調整されたチェックポイントは、HuggingFace で入手できます。モデルを微調整するには、mPLUG-Owl Repo を参照してください。ビデオ推論を実行するには、次のコードを使用できます。

 import torch
from mplug_owl_video . modeling_mplug_owl import MplugOwlForConditionalGeneration
from transformers import AutoTokenizer
from mplug_owl_video . processing_mplug_owl import MplugOwlImageProcessor , MplugOwlProcessor

pretrained_ckpt = 'MAGAer13/mplug-youku-bloomz-7b'
model = MplugOwlForConditionalGeneration . from_pretrained (
    pretrained_ckpt ,
    torch_dtype = torch . bfloat16 ,
    device_map = { '' : 0 },
)
image_processor = MplugOwlImageProcessor . from_pretrained ( pretrained_ckpt )
tokenizer = AutoTokenizer . from_pretrained ( pretrained_ckpt )
processor = MplugOwlProcessor ( image_processor , tokenizer )

# We use a human/AI template to organize the context as a multi-turn conversation.
# <|video|> denotes an video placehold.
prompts = [
'''The following is a conversation between a curious human and AI assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
Human: <|video|>
Human: 视频中的女人在干什么？
AI: ''' ]

video_list = [ 'yoga.mp4' ]

# generate kwargs (the same in transformers) can be passed in the do_generate()
generate_kwargs = {
    'do_sample' : True ,
    'top_k' : 5 ,
    'max_length' : 512
}
inputs = processor ( text = prompts , videos = video_list , num_frames = 4 , return_tensors = 'pt' )
inputs = { k : v . bfloat16 () if v . dtype == torch . float else v for k , v in inputs . items ()}
inputs = { k : v . to ( model . device ) for k , v in inputs . items ()}
with torch . no_grad ():
    res = model . generate ( ** inputs , ** generate_kwargs )
sentence = tokenizer . decode ( res . tolist ()[ 0 ], skip_special_tokens = True )
print ( sentence )

Youku-mPLUG を引用

このデータセットがあなたの研究に役立つと思われる場合は、論文の引用を検討してください。

 @misc { xu2023youku_mplug ,
    title = { Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks } ,
    author = { Haiyang Xu, Qinghao Ye, Xuan Wu, Ming Yan, Yuan Miao, Jiabo Ye, Guohai Xu, Anwen Hu, Yaya Shi, Chenliang Li, Qi Qian, Que Maofei, Ji Zhang, Xiao Zeng, Fei Huang } ,
    year = { 2023 } ,
    eprint = { 2306.04362 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.CL }
}

拡大する

追加情報

バージョン 1.0.0
タイプその他のソースコード
更新時間 2024-12-13
サイズ 15.45MB
から Github

Youku mPLUG

Youku-mPLUG 10M 中国語の大規模ビデオテキストデータセット

Youku-mPLUGとは何ですか?

データ統計

ゼロショット機能

ダウンロード

設定

mPLUG-ビデオ (1.3B / 2.7B)

事前トレーニング

ベンチマーク

実験結果

mPLUG-ビデオ (BloomZ-7B)

Youku-mPLUG を引用

GitHub sgrebnov/cordova plugin background download

Wa ch ull navra maza navsacha 2 2024 ull ovie Fr e Online On Strea ings

Wa ch navra maza navsacha 2 2024 ull ovie Online For Fr e Strea ings At Home

Wa ch the greatest of all time 2024 ull ovie Online For Fr e Strea ings At Home

wolfs 2024 f llmo ie f lmyz lla dow load ree 7 0p 4 0p a d 10 0p

GitHub actions/download artifact

chat.petals.dev

GPT Prompt Templates

GPTyped

waymo open dataset

SmartTube

Sunamu

waymo open dataset

termwind

wp functions

Youku mPLUG

Youku-mPLUG 10M 中国語の大規模ビデオ テキスト データセット

Youku-mPLUGとは何ですか?

データ統計

ゼロショット機能

ダウンロード

設定

mPLUG-ビデオ (1.3B / 2.7B)

事前トレーニング

ベンチマーク

実験結果

mPLUG-ビデオ (BloomZ-7B)

Youku-mPLUG を引用

Youku-mPLUG 10M 中国語の大規模ビデオテキストデータセット