3D ResNets PyTorch 다운로드 - 3D ResNets PyTorch 소스 코드 다운로드

3D ResNets PyTorch

파이썬

v0.3.1

다운로드

동작 인식을 위한 3D ResNet

업데이트 (2020/4/13)

우리는 arXiv에 논문을 게재했습니다.

카타오카 히로카츠, 와카미야 텐가, 하라 켄쇼, 사토 유타카
"대규모 데이터세트가 시공간 3D CNN을 더욱 향상시킬 수 있을까요?",
arXiv 사전 인쇄, arXiv:2004.04968, 2020.

우리는 Kinetics-700 및 Moments in Time과 결합된 데이터 세트에 대해 사전 훈련된 ResNet-50을 포함하여 이 문서에 설명된 사전 훈련된 모델을 업로드했습니다.

업데이트 (2020/4/10)

우리는 스크립트를 대폭 업데이트했습니다. 이전 버전을 사용하여 CVPR2018 논문을 재현하려면 CVPR2018 분기의 스크립트를 사용해야 합니다.

이번 업데이트에는 다음이 포함됩니다.

전체 프로젝트 리팩토링
최신 PyTorch 버전 지원
분산 교육 지원
Moments in Time 데이터 세트에 대한 교육 및 테스트를 지원합니다.
R(2+1)D 모델 추가
Kinetics-700, Moments in Time 및 STAIR-Actions 데이터 세트에서 훈련된 3D ResNet 모델 업로드

요약

이는 다음 논문에 대한 PyTorch 코드입니다.

카타오카 히로카츠, 와카미야 텐가, 하라 켄쇼, 사토 유타카
"대규모 데이터세트가 시공간 3D CNN을 더욱 향상시킬 수 있을까요?",
arXiv 사전 인쇄, arXiv:2004.04968, 2020.

하라 켄쇼, 카타오카 히로카츠, 사토 유타카
"시공간 3D 컨볼루션을 사용한 동작 인식의 모범 사례를 향하여",
패턴 인식에 관한 국제 회의 간행물, pp. 2516-2521, 2018.

하라 켄쇼, 카타오카 히로카츠, 사토 유타카
"시공간 3D CNN이 2D CNN과 ImageNet의 역사를 추적할 수 있습니까?",
컴퓨터 비전 및 패턴 인식에 관한 IEEE 회의 간행물, pp. 6546-6555, 2018.

하라 켄쇼, 카타오카 히로카츠, 사토 유타카
"동작 인식을 위한 3D 잔차 네트워크를 사용한 시공간 특징 학습",
동작, 제스처 및 감정 인식에 관한 ICCV 워크숍 간행물, 2017.

이 코드에는 Kinetics, Moments in Time, ActivityNet, UCF-101 및 HMDB-51에 대한 교육, 미세 조정 및 테스트가 포함되어 있습니다.

소환

이 코드나 사전 학습된 모델을 사용하는 경우 다음을 인용하세요.

 @inproceedings { hara3dcnns ,
  author = { Kensho Hara and Hirokatsu Kataoka and Yutaka Satoh } ,
  title = { Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? } ,
  booktitle = { Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) } ,
  pages = { 6546--6555 } ,
  year = { 2018 } ,
}

사전 학습된 모델

사전 훈련된 모델은 여기에서 사용할 수 있습니다.
모든 모델은 Kinetics-700( K ), Moments in Time( M ), STAIR-Actions( S ) 또는 이들의 병합된 데이터세트( KM , KS , MS , KMS )에서 학습됩니다.
데이터 세트의 모델을 미세 조정하려면 다음 옵션을 지정해야 합니다.

 r3d18_K_200ep.pth: --model resnet --model_depth 18 --n_pretrain_classes 700
r3d18_KM_200ep.pth: --model resnet --model_depth 18 --n_pretrain_classes 1039
r3d34_K_200ep.pth: --model resnet --model_depth 34 --n_pretrain_classes 700
r3d34_KM_200ep.pth: --model resnet --model_depth 34 --n_pretrain_classes 1039
r3d50_K_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 700
r3d50_KM_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 1039
r3d50_KMS_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 1139
r3d50_KS_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 800
r3d50_M_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 339
r3d50_MS_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 439
r3d50_S_200ep.pth: --model resnet --model_depth 50 --n_pretrain_classes 100
r3d101_K_200ep.pth: --model resnet --model_depth 101 --n_pretrain_classes 700
r3d101_KM_200ep.pth: --model resnet --model_depth 101 --n_pretrain_classes 1039
r3d152_K_200ep.pth: --model resnet --model_depth 152 --n_pretrain_classes 700
r3d152_KM_200ep.pth: --model resnet --model_depth 152 --n_pretrain_classes 1039
r3d200_K_200ep.pth: --model resnet --model_depth 200 --n_pretrain_classes 700
r3d200_KM_200ep.pth: --model resnet --model_depth 200 --n_pretrain_classes 1039

이전의 사전 훈련된 모델은 여기에서 계속 사용할 수 있습니다.
그러나 현재 스크립트에서 이전의 사전 학습된 모델을 사용하려면 일부 수정이 필요합니다.

요구사항

PyTorch (버전 0.4 이상 필요)

conda install pytorch torchvision cudatoolkit=10.1 -c soumith

FFmpeg, FF프로브
파이썬 3

준비

액티비티넷

공식 크롤러를 사용하여 비디오를 다운로드하세요.
util_scripts/generate_video_jpgs.py 사용하여 avi에서 jpg 파일로 변환

python -m util_scripts.generate_video_jpgs mp4_video_dir_path jpg_video_dir_path activitynet

json 파일 util_scripts/add_fps_into_activitynet_json.py 에 fps 정보 표시를 추가하세요.

python -m util_scripts.add_fps_into_activitynet_json mp4_video_dir_path json_file_path

동력학

공식 크롤러를 사용하여 비디오를 다운로드하세요.
- video_directory/test 에서 테스트 세트를 찾으세요.
util_scripts/generate_video_jpgs.py 사용하여 avi에서 jpg 파일로 변환

python -m util_scripts.generate_video_jpgs mp4_video_dir_path jpg_video_dir_path kinetics

util_scripts/kinetics_json.py 사용하여 ActivityNet과 유사한 json 형식으로 주석 파일을 생성합니다.
- CSV 파일(Kinetics_{train, val, test}.csv)이 크롤러에 포함됩니다.

python -m util_scripts.kinetics_json csv_dir_path 700 jpg_video_dir_path jpg dst_json_path

UCF-101

여기에서 비디오를 다운로드하고 훈련/테스트 분할을 수행하세요.
util_scripts/generate_video_jpgs.py 사용하여 avi에서 jpg 파일로 변환

python -m util_scripts.generate_video_jpgs avi_video_dir_path jpg_video_dir_path ucf101

util_scripts/ucf101_json.py 사용하여 ActivityNet과 유사한 json 형식으로 주석 파일을 생성합니다.
- annotation_dir_path 에는 classInd.txt, trainlist0{1, 2, 3}.txt, testlist0{1, 2, 3}.txt가 포함됩니다.

python -m util_scripts.ucf101_json annotation_dir_path jpg_video_dir_path dst_json_path

HMDB-51

여기에서 비디오를 다운로드하고 훈련/테스트 분할을 수행하세요.
util_scripts/generate_video_jpgs.py 사용하여 avi에서 jpg 파일로 변환

python -m util_scripts.generate_video_jpgs avi_video_dir_path jpg_video_dir_path hmdb51

util_scripts/hmdb51_json.py 사용하여 ActivityNet과 유사한 json 형식으로 주석 파일을 생성합니다.
- annotation_dir_path 에는 Brush_hair_test_split1.txt가 포함됩니다...

python -m util_scripts.hmdb51_json annotation_dir_path jpg_video_dir_path dst_json_path

코드 실행

데이터 디렉터리의 구조가 다음과 같다고 가정합니다.

 ~/
  data/
    kinetics_videos/
      jpg/
        .../ (directories of class names)
          .../ (directories of video names)
            ... (jpg files)
    results/
      save_100.pth
    kinetics.json

모든 옵션을 확인하세요.

python main.py -h

4개의 CPU 스레드(데이터 로딩용)를 사용하여 Kinetics-700 데이터 세트(700개 클래스)에서 ResNets-50을 교육합니다.
배치 크기는 128입니다.
5개 에포크마다 모델을 저장합니다. 모든 GPU는 훈련에 사용됩니다. GPU의 일부를 원하면 CUDA_VISIBLE_DEVICES=... 사용하세요.

python main.py --root_path ~ /data --video_path kinetics_videos/jpg --annotation_path kinetics.json 
--result_path results --dataset kinetics --model resnet 
--model_depth 50 --n_classes 700 --batch_size 128 --n_threads 4 --checkpoint 5

에포크 101부터 훈련을 계속합니다. (~/data/results/save_100.pth가 로드됩니다.)

python main.py --root_path ~ /data --video_path kinetics_videos/jpg --annotation_path kinetics.json 
--result_path results --dataset kinetics --resume_path results/save_100.pth 
--model_depth 50 --n_classes 700 --batch_size 128 --n_threads 4 --checkpoint 5

훈련된 모델(~/data/results/save_200.pth.)을 사용하여 각 동영상의 상위 5개 클래스 확률을 계산합니다.
실제 배치 크기는 inference_batch_size * (n_video_frames / inference_stride) 로 계산되므로 inference_batch_size 작아야 합니다.

python main.py --root_path ~ /data --video_path kinetics_videos/jpg --annotation_path kinetics.json 
--result_path results --dataset kinetics --resume_path results/save_200.pth 
--model_depth 50 --n_classes 700 --n_threads 4 --no_train --no_val --inference --output_topk 5 --inference_batch_size 1

인식 결과(~/data/results/val.json)의 상위 1개 비디오 정확도를 평가합니다.

python -m util_scripts.eval_accuracy ~ /data/kinetics.json ~ /data/results/val.json --subset val -k 1 --ignore

UCF-101에서 사전 훈련된 모델(~/data/models/resnet-50-kintics.pth)의 fc 레이어를 미세 조정합니다.

python main.py --root_path ~ /data --video_path ucf101_videos/jpg --annotation_path ucf101_01.json 
--result_path results --dataset ucf101 --n_classes 101 --n_pretrain_classes 700 
--pretrain_path models/resnet-50-kinetics.pth --ft_begin_module fc 
--model resnet --model_depth 50 --batch_size 128 --n_threads 4 --checkpoint 5

확장하다

추가 정보