This repo holds codes of the paper: Improving Continuous Sign Language Recognition with Adapted Image Models. (Preprint) [paper]
This repo is based on VAC (ICCV 2021). Many thanks for their great work!
This project is implemented in Pytorch (better >=1.13 to be compatible with ctcdecode or these may exist errors). Thus please install Pytorch first.
ctcdecode==0.4 [parlance/ctcdecode],for beam search decode.
[Optional] sclite [kaldi-asr/kaldi], install kaldi tool to get sclite for evaluation. After installation, create a soft link toward the sclite:
mkdir ./software
ln -s PATH_TO_KALDI/tools/sctk-2.4.10/bin/sclite ./software/sclite
You may use the python version evaluation tool for convenience (by setting 'evaluate_tool' as 'python' in line 16 of ./configs/baseline.yaml), but sclite can provide more detailed statistics.
You can install other required modules by conducting
pip install -r requirements.txt
The implementation for CLIP and other proposed components is given in ./modules/openai/model.py.
You can choose any one of following datasets to verify the effectiveness of AdaptSign.
Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]. Our experiments based on phoenix-2014.v3.tar.gz.
After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
ln -s PATH_TO_DATASET/phoenix2014-release ./dataset/phoenix2014
The original image sequence is 210x260, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.
cd ./preprocess
python dataset_preprocess.py --process-image --multiprocessing
Download the RWTH-PHOENIX-Weather 2014 Dataset [download link]
After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
ln -s PATH_TO_DATASET/PHOENIX-2014-T-release-v3/PHOENIX-2014-T ./dataset/phoenix2014-T
The original image sequence is 210x260, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.
cd ./preprocess
python dataset_preprocess-T.py --process-image --multiprocessing
Request the CSL Dataset from this website [download link]
After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
ln -s PATH_TO_DATASET ./dataset/CSL
The original image sequence is 1280x720, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.
cd ./preprocess
python dataset_preprocess-CSL.py --process-image --multiprocessing
Request the CSL-Daily Dataset from this website [download link]
After finishing dataset download, extract it. It is suggested to make a soft link toward downloaded dataset.
ln -s PATH_TO_DATASET ./dataset/CSL-Daily
The original image sequence is 1280x720, we resize it to 256x256 for augmentation. Run the following command to generate gloss dict and resize image sequence.
cd ./preprocess
python dataset_preprocess-CSL-Daily.py --process-image --multiprocessing
Backbone | Dev WER | Test WER | Pretrained model |
---|---|---|---|
ResNet18 | 18.5% | 18.8% | [Baidu] (passwd: enyp) [Google Drive] |
Backbone | Dev WER | Test WER | Pretrained model |
---|---|---|---|
ResNet18 | 18.6% | 18.9% | [Baidu] (passwd: pfk1) [Google Drive] |
Backbone | Dev WER | Test WER | Pretrained model |
---|---|---|---|
ResNet18 | 26.7% | 26.3% | [Baidu] (passwd: kbu4) [Google Drive] |
To evaluate the pretrained model, choose the dataset from phoenix2014/phoenix2014-T/CSL/CSL-Daily in line 3 in ./config/baseline.yaml first, and run the command below:
python main.py --device your_device --load-weights path_to_weight.pt --phase test
The priorities of configuration files are: command line > config file > default values of argparse. To train the SLR model, run the command below:
python main.py --device your_device
Note that you can choose the target dataset from phoenix2014/phoenix2014-T/CSL/CSL-Daily in line 3 in ./config/baseline.yaml.