English | 中文
Introduction | ?Installation | Quick Start | Tutorials | ?Model List | ?Dataset List | Frequently Asked Questions | ?Notes
MindOCR is an open-source toolbox for OCR development and application based on MindSpore, which integrates series of mainstream text detection and recognition algorihtms/models, provides easy-to-use training and inference tools. It can accelerate the process of developing and deploying SoTA text detection and recognition models in real-world applications, such as DBNet/DBNet++ and CRNN/SVTR, and help fulfill the need of image-text understanding.
The following is the corresponding mindocr
versions and supported
mindspore versions.
mindocr | mindspore |
---|---|
master | master |
0.4 | 2.3.0 |
0.3 | 2.2.10 |
0.1 | 1.8 |
MindOCR is built on MindSpore AI framework and is compatible with the following framework versions. installation guideline for Training, please refer to the installation links shown below.
mindocr
versions.MindSpore Lite offline Inference please refer to Lite offline Environment Installation
pip install -r requirements.txt
git clone https://github.com/mindspore-lab/mindocr.git
cd mindocr
pip install -e .
Using
-e
for "editable" mode can help resolve potential module import issues.
The environment information of dockers provided is as following:
Please follow the steps to install docker:
Download docker
docker pull swr.cn-central-221.ovaijisuan.com/mindocr/mindocr_dev_910_ms_2_2_10_cann7_0_py39:v1
docker pull swr.cn-central-221.ovaijisuan.com/mindocr/mindocr_dev_ms_2_2_10_cann7_0_py39:v1
Create container
docker_name="temp_mindocr"
# 910
image_name="swr.cn-central-221.ovaijisuan.com/mindocr/mindocr_dev_910_ms_2_2_10_cann7_0_py39:v1"
# 910*
image_name="swr.cn-central-221.ovaijisuan.com/mindocr/mindocr_dev_ms_2_2_10_cann7_0_py39:v1"
docker run --privileged --name ${docker_name}
--tmpfs /tmp
--tmpfs /run
-v /sys/fs/cgroup:/sys/fs/cgroup:ro
--device=/dev/davinci1
--device=/dev/davinci2
--device=/dev/davinci3
--device=/dev/davinci4
--device=/dev/davinci5
--device=/dev/davinci6
--device=/dev/davinci7
--device=/dev/davinci_manager
--device=/dev/hisi_hdc
--device=/dev/devmm_svm
-v /etc/localtime:/etc/localtime
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi
--shm-size 800g
--cpus 96
--security-opt seccomp=unconfined
--network=bridge -itd ${image_name} bash
Enter container
# set docker id
container_id="your docker id"
docker exec -it --user root $container_id bash
Set environment variables After entering container, set environment variables by the following command:
source env_setup.sh
pip install mindocr
As this project is under active development, the version installed from PyPI is out-of-date currently. (will update soon).
After installing MindOCR, we can run text detection and recognition on an arbitrary image easily as follows.
python tools/infer/text/predict_system.py --image_dir {path_to_img or dir_to_imgs}
--det_algorithm DB++
--rec_algorithm CRNN
--visualize_output True
After running, the results will be saved in ./inference_results
by default. Here is an example result.
Visualization of text detection and recognition result
We can see that all texts on the image are detected and recognized accurately. For more usage, please refer to the inference section in tutorials.
It is easy to train your OCR model with the tools/train.py
script, which supports both text detection and recognition model training.
python tools/train.py --config {path/to/model_config.yaml}
The --config
arg specifies the path to a yaml file that defines the model to be trained and the training strategy including data process pipeline, optimizer, lr scheduler, etc.
MindOCR provides SoTA OCR models with their training strategies in configs
folder.
You may adapt it to your task/dataset, for example, by running
# train text detection model DBNet++ on icdar15 dataset
python tools/train.py --config configs/det/dbnet/dbpp_r50_icdar15.yaml
# train text recognition model CRNN on icdar15 dataset
python tools/train.py --config configs/rec/crnn/crnn_icdar15.yaml
Similarly, it is easy to evaluate the trained model with the tools/eval.py
script.
python tools/eval.py
--config {path/to/model_config.yaml}
--opt eval.dataset_root={path/to/your_dataset} eval.ckpt_load_path={path/to/ckpt_file}
For more illustration and usage, please refer to the model training section in Tutorials.
You can do MindSpore Lite inference in MindOCR using MindOCR models or Third-party models (PaddleOCR, MMOCR, etc.). Please refer to Model Offline Inference Tutorial
For the detailed performance of the trained models, please refer to https://github.com/mindspore-lab/mindocr/blob/main/configs.
For details of MindSpore Lite inference models support, please refer to MindOCR Models Support List and Third-party Models Support List (PaddleOCR etc.).
MindOCR provides a dataset conversion tool to OCR datasets with different formats and support customized dataset by users. We have validated the following public OCR datasets in model training/evaluation.
We will include more datasets for training and evaluation. This list will be continuously updated.
Frequently asked questions about configuring environment and mindocr, please refer to FAQ.
resume
parameter under the model
field in the yaml config, e.g.,resume: True
, load and resume training from {ckpt_save_dir}/train_resume.ckpt or resume: /path/to/train_resume.ckpt
, load and resume training from the given path.eval.dataset.output_columns
list.pred_cast_fp32
for ctcloss in AMP training, fix error when invalid polygons exist.model-pretrained
with checkpoint url or local path in yaml.train-ema
(default: False) and train-ema_decay
in the yaml config.num_columns_to_net
-> net_input_column_index
: change the column number feeding into the network to the column index.num_columns_of_labels
-> label_column_index
: change the column number corresponds to the label to the column index.grouping_strategy
argument in yaml config to select a predefined grouping strategy, or use no_weight_decay_params
argument to pick layers to exclude from weight decay (e.g., bias, norm). Example can be referred in configs/rec/crnn/crnn_icdar15.yaml
gradient_accumulation_steps
in yaml config, the global batch size = batch_size * devices * gradient_accumulation_steps. Example can be referred in configs/rec/crnn/crnn_icdar15.yaml
grad_clip
as True in yaml config.type
of loss_scale
as dynamic
. A YAML example can be viewed in configs/rec/crnn/crnn_icdar15.yaml
output_keys
-> output_columns
, num_keys_to_net
-> num_columns_to_net
i) Create a new training task on the openi cloud platform.
ii) Link the dataset (e.g., ic15_mindocr) on the webpage.
iii) Add run parameter `config` and write the yaml file path on the website UI interface, e.g., '/home/work/user-job-dir/V0001/configs/rec/test.yaml'
iv) Add run parameter `enable_modelarts` and set True on the website UI interface.
v) Fill in other blanks and launch.
We appreciate all kinds of contributions including issues and PRs to make MindOCR better.
Please refer to CONTRIBUTING.md for the contributing guideline. Please follow the Model Template and Guideline for contributing a model that fits the overall interface :)
This project follows the Apache License 2.0 open-source license.
If you find this project useful in your research, please consider citing:
@misc{MindSpore OCR 2023,
title={{MindSpore OCR }:MindSpore OCR Toolbox},
author={MindSpore Team},
howpublished = {url{https://github.com/mindspore-lab/mindocr/}},
year={2023}
}