3D ResNets PyTorchダウンロード - 3D ResNets PyTorchソースコードのダウンロード

3D ResNets PyTorch

パイソン

v0.3.1

ダウンロード

アクション認識のための 3D ResNets

更新情報(2020/4/13)

arXiv に論文を掲載しました。

片岡宏勝、若宮天河、原健翔、佐藤裕、
「大規模なデータセットは時空間 3D CNN をさらに強化する可能性がある」、
arXiv プレプリント、arXiv:2004.04968、2020。

Kinetics-700 と Moments in Time を組み合わせたデータセットで事前トレーニングされた ResNet-50 を含む、この論文で説明されている事前トレーニングされたモデルをアップロードしました。

更新情報(2020/4/10)

スクリプトを大幅に更新しました。古いバージョンを使用して CVPR2018 論文を再現したい場合は、CVPR2018 ブランチのスクリプトを使用する必要があります。

この更新には次のものが含まれます。

プロジェクト全体のリファクタリング
新しい PyTorch バージョンのサポート
分散型トレーニングのサポート
Moments in Time データセットでのトレーニングとテストをサポートします。
R(2+1)D モデルの追加
Kinetics-700、Moments in Time、STAIR-Actions データセットでトレーニングされた 3D ResNet モデルのアップロード

まとめ

これは、次の論文の PyTorch コードです。

片岡宏勝、若宮天河、原健翔、佐藤裕、
「大規模なデータセットは時空間 3D CNN をさらに強化する可能性がある」、
arXiv プレプリント、arXiv:2004.04968、2020。

原健翔選手、片岡博勝選手、佐藤裕選手、
「時空間 3D 畳み込みによる行動認識のグッドプラクティスに向けて」,
パターン認識に関する国際会議の議事録、2516 ～ 2521 ページ、2018 年。

原健翔選手、片岡博勝選手、佐藤裕選手、
「時空間 3D CNN は 2D CNN と ImageNet の歴史をたどることができますか?」、
コンピュータービジョンとパターン認識に関する IEEE 会議議事録、6546 ～ 6555 ページ、2018 年。

原健翔選手、片岡博勝選手、佐藤裕選手、
"行動認識のための 3D 残差ネットワークによる時空間特徴の学習",
アクション、ジェスチャー、感情認識に関する ICCV ワークショップの議事録、2017 年。

このコードには、Kinetics、Moments in Time、ActivityNet、UCF-101、および HMDB-51 に関するトレーニング、微調整、およびテストが含まれています。

引用

このコードまたは事前トレーニングされたモデルを使用する場合は、次の点を引用してください。

 @inproceedings { hara3dcnns ,
  author = { Kensho Hara and Hirokatsu Kataoka and Yutaka Satoh } ,
  title = { Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? } ,
  booktitle = { Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) } ,
  pages = { 6546--6555 } ,
  year = { 2018 } ,
}

事前トレーニングされたモデル

事前トレーニングされたモデルはここから入手できます。
すべてのモデルは、Kinetics-700 ( K )、Moments in Time ( M )、STAIR-Actions ( S )、またはそれらのマージされたデータセット ( KM 、 KS 、 MS 、 KMS ) でトレーニングされます。
データセット上のモデルを微調整する場合は、次のオプションを指定する必要があります。

 r3d18_K_200ep.pth: --model resnet --model_depth 18 --n_pretrain_classes 700
r3d18_KM_200ep.pth: --model resnet --model_depth 18 --n_pretrain_classes 1039
r3d34_K_200ep.pth: --model resnet --model_depth 34 --n_pretrain_classes 700
r3d34_KM_200ep.pth: --model resnet --model_depth 34 --n_pretrain_classes 1039
r3d50_K_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 700
r3d50_KM_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 1039
r3d50_KMS_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 1139
r3d50_KS_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 800
r3d50_M_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 339
r3d50_MS_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 439
r3d50_S_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 100
r3d101_K_200ep.pth: --model resnet --model_depth 101 --n_pretrain_classes 700
r3d101_KM_200ep.pth: --model resnet --model_depth 101 --n_pretrain_classes 1039
r3d152_K_200ep.pth: --model resnet --model_depth 152 --n_pretrain_classes 700
r3d152_KM_200ep.pth: --model resnet --model_depth 152 --n_pretrain_classes 1039
r3d200_K_200ep.pth: --model resnet --model_depth 200 --n_pretrain_classes 700
r3d200_KM_200ep.pth: --model resnet --model_depth 200 --n_pretrain_classes 1039

古い事前トレーニング済みモデルはまだここから入手できます。
ただし、現在のスクリプトで古い事前トレーニング済みモデルを使用するには、いくつかの変更が必要です。

要件

PyTorch (バージョン 0.4 以降が必要)

conda install pytorch torchvision cudatoolkit=10.1 -c soumith

FFmpeg、FFprobe
パイソン3

準備

アクティビティネット

公式クローラーを使用して動画をダウンロードします。
util_scripts/generate_video_jpgs.pyを使用して avi から jpg ファイルに変換します。

python -m util_scripts.generate_video_jpgs mp4_video_dir_path jpg_video_dir_path activitynet

fps 情報を json ファイルutil_scripts/add_fps_into_activitynet_json.pyに追加します。

python -m util_scripts.add_fps_into_activitynet_json mp4_video_dir_path json_file_path

動力学

公式クローラーを使用して動画をダウンロードします。
- video_directory/testでテストセットを見つけます。
util_scripts/generate_video_jpgs.pyを使用して avi から jpg ファイルに変換します。

python -m util_scripts.generate_video_jpgs mp4_video_dir_path jpg_video_dir_path kinetics

util_scripts/kinetics_json.pyを使用してActivityNetと同様のjson形式のアノテーションファイルを生成します
- CSV ファイル (kinetics_{train, val, test}.csv) はクローラに含まれています。

python -m util_scripts.kinetics_json csv_dir_path 700 jpg_video_dir_path jpg dst_json_path

UCF-101

ここからビデオをダウンロードし、分割をトレーニング/テストします。
util_scripts/generate_video_jpgs.pyを使用して avi から jpg ファイルに変換します。

python -m util_scripts.generate_video_jpgs avi_video_dir_path jpg_video_dir_path ucf101

util_scripts/ucf101_json.pyを使用してActivityNetと同様のjson形式のアノテーションファイルを生成します
- annotation_dir_pathには classInd.txt、trainlist0{1, 2, 3}.txt、testlist0{1, 2, 3}.txt が含まれます

python -m util_scripts.ucf101_json annotation_dir_path jpg_video_dir_path dst_json_path

HMDB-51

ここからビデオをダウンロードし、分割をトレーニング/テストします。
util_scripts/generate_video_jpgs.pyを使用して avi から jpg ファイルに変換します。

python -m util_scripts.generate_video_jpgs avi_video_dir_path jpg_video_dir_path hmdb51

util_scripts/hmdb51_json.pyを使用して、ActivityNet と同様の json 形式のアノテーションファイルを生成します
- annotation_dir_pathには、brush_hair_test_split1.txt などが含まれます...

python -m util_scripts.hmdb51_json annotation_dir_path jpg_video_dir_path dst_json_path

コードを実行する

データディレクトリの構造が次であると仮定します。

 ~/
  data/
    kinetics_videos/
      jpg/
        .../ (directories of class names)
          .../ (directories of video names)
            ... (jpg files)
    results/
      save_100.pth
    kinetics.json

すべてのオプションを確認します。

python main.py -h

4 CPU スレッド (データ読み込み用) を使用して、Kinetics-700 データセット (700 クラス) で ResNets-50 をトレーニングします。
バッチサイズは128です。
5 エポックごとにモデルを保存します。すべての GPU がトレーニングに使用されます。 GPU の一部が必要な場合は、 CUDA_VISIBLE_DEVICES=...を使用します。

python main.py --root_path ~ /data --video_path kinetics_videos/jpg --annotation_path kinetics.json 
--result_path results --dataset kinetics --model resnet 
--model_depth 50 --n_classes 700 --batch_size 128 --n_threads 4 --checkpoint 5

エポック 101 からトレーニングを続行します (~/data/results/save_100.pth がロードされます)。

python main.py --root_path ~ /data --video_path kinetics_videos/jpg --annotation_path kinetics.json 
--result_path results --dataset kinetics --resume_path results/save_100.pth 
--model_depth 50 --n_classes 700 --batch_size 128 --n_threads 4 --checkpoint 5

トレーニング済みモデル (~/data/results/save_200.pth.) を使用して、各ビデオのトップ 5 クラスの確率を計算します。
実際のバッチサイズはinference_batch_size * (n_video_frames / inference_stride)によって計算されるため、 inference_batch_size小さくする必要があることに注意してください。

python main.py --root_path ~ /data --video_path kinetics_videos/jpg --annotation_path kinetics.json 
--result_path results --dataset kinetics --resume_path results/save_200.pth 
--model_depth 50 --n_classes 700 --n_threads 4 --no_train --no_val --inference --output_topk 5 --inference_batch_size 1

認識結果のトップ 1 ビデオ精度を評価します (~/data/results/val.json)。

python -m util_scripts.eval_accuracy ~ /data/kinetics.json ~ /data/results/val.json --subset val -k 1 --ignore

UCF-101 で事前トレーニング済みモデル (~/data/models/resnet-50-kinetics.pth) の fc レイヤーを微調整します。

python main.py --root_path ~ /data --video_path ucf101_videos/jpg --annotation_path ucf101_01.json 
--result_path results --dataset ucf101 --n_classes 101 --n_pretrain_classes 700 
--pretrain_path models/resnet-50-kinetics.pth --ft_begin_module fc 
--model resnet --model_depth 50 --batch_size 128 --n_threads 4 --checkpoint 5

拡大する

追加情報