This repository is an official implementation of the paper titled above. Please refer to project page or paper for more details.
We check the reproducibility under this environment.
Install python dependencies. Perhaps this should be done inside venv
.
pip install -r requirements.txt
Note that Tensorflow has a version-specific system requirement for GPU environment. Check if the compatible CUDA/CuDNN runtime is installed.
To try demo on pre-trained models
./data
../results
.You can test some tasks using the pre-trained models in the notebook.
You can train your own model.
The trainer script takes a few arguments to control hyperparameters.
See src/mfp/mfp/args.py
for the list of available options.
If the script slows an out-of-memory error, please make sure other processes do not occupy GPU memory and adjust --batch_size
.
bin/train_mfp.sh crello --masking_method random # Ours-IMP
bin/train_mfp.sh crello --masking_method elem_pos_attr_img_txt # Ours-EXP
bin/train_mfp.sh crello --masking_method elem_pos_attr_img_txt --weights <WEIGHTS> # Ours-EXP-FT
The trainer outputs logs, evaluation results, and checkpoints to tmp/mfp/jobs/<job_id>
.
The training progress can be monitored via tensorboard
.
You perform quantitative evaluation.
bin/eval_mfp.sh --job_dir <JOB_DIR> (<ADDITIONAL_ARGS>)
See eval.py for <ADDITIONAL_ARGS>
.
You can test some tasks using the pre-trained models in the notebook.
The process is almost similar as above.
bin/train_mfp.sh rico --masking_method random # Ours-IMP
bin/train_mfp.sh rico --masking_method elem_pos_attr # Ours-EXP
bin/train_mfp.sh rico --masking_method elem_pos_attr --weights <WEIGHTS> # Ours-EXP-FT
The process is similar as above.
If you find this code useful for your research, please cite our paper.
@inproceedings{inoue2023document,
title={{Towards Flexible Multi-modal Document Models}},
author={Naoto Inoue and Kotaro Kikuchi and Edgar Simo-Serra and Mayu Otani and Kota Yamaguchi},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2023},
pages={14287-14296},
}