Chinese • English • Japanese
Free, open source, batch-capable offline OCR software
Applicable to Windows7 x64, Linux x64
Free : All code in this project is open source and completely free.
Convenient : Unzip and use, run offline, no network required.
Efficient : It comes with a highly efficient offline OCR engine and built-in multiple language recognition libraries.
Flexible : Supports external calling methods such as command line and HTTP interface.
Functions : Screenshot OCR / Batch OCR / PDF recognition / QR code / formula recognition
Screenshot recognition
Typesetting analysis - identify different types of types and output text in the correct order
Batch identification
Ignore area - exclude the text at the screenshot watermark
QR code supports scanning or generating QR code pictures
Document recognition extracts text from PDF scans or converts them into two-layer searchable PDFs
Global settings
Command line call
HTTP interface
Build the project (Windows, Linux)
Developers please be sure to read Building Projects.
The following release links are maintained for a long time and provide downloads of stable versions.
Lanzoul Cloud https://hiroi-sora.lanzoul.com/s/umi-ocr (domestic recommendation, no registration/unlimited speed)
GitHub https://github.com/hiroi-sora/Umi-OCR/releases/latest
Source Forge https://sourceforge.net/projects/umi-ocr
Scoop is a command line installation program under Windows that can easily manage multiple applications. You can install Scoop first, and then use the following instructions to install Umi-OCR
:
Add extras
bucket:
scoop bucket add extras
(Optional 1) Install Umi-OCR (comes with Rapid-OCR
engine, good compatibility):
scoop install extras/umi-ocr
(Optional 2) Install Umi-OCR (comes with Paddle-OCR
engine, slightly faster):
scoop install extras/umi-ocr-paddle
Do not install both at the same time, the shortcuts may be overwritten. But you can import additional plug-ins and switch to different OCR engines at any time.
The software release package is downloaded as .7z
compressed package or a .7z.exe
self-extracting package. Self-extracting packages can decompress files on computers that do not have compression software installed.
This software does not require installation. After unzipping, click Umi-OCR.exe
to start the program.
If you encounter any problems, please submit an Issue and I will try my best to help you.
The interface supported by Umi-OCR is in multiple languages. When you open the software for the first time, the language will be automatically switched according to your computer's system settings.
If you need to switch the language manually, please refer to the figure below,全局设置
→语言/Language
.
Umi-OCR v2 consists of a series of flexible and easy-to-use tabs . You can open the required tabs according to your preference.
You can switch the window to the top in the upper left corner of the tab bar. Tabs can be locked in the upper right corner to prevent accidentally closing tabs during daily use.
Screenshot OCR : After opening this page, you can use shortcut keys to evoke the screenshot and recognize the text in the picture.
In the picture preview bar on the left, you can directly select and copy with the mouse.
In the identification record column on the right, text can be edited and multiple records can be selected and copied.
It also supports copying images elsewhere and pasting them into Umi-OCR for recognition.
About the formula recognition function
About OCR text post-processing - typesetting analysis solution : The typesetting and order of OCR results can be organized to make the text more suitable for reading and use. Default plan:
多栏-按自然段换行
: suitable for most scenarios, automatically recognizes multi-column layout and wraps lines according to natural paragraph rules.
多栏-总是换行
: Each statement is wrapped.
多栏-无换行
: Forces all statements to be merged into the same line.
单栏-按自然段换行
/总是换行
/无换行
: similar to the above, but does not distinguish between multi-column layouts.
单栏-保留缩进
: suitable for parsing code screenshots, retaining the indentation at the beginning of the line and the spaces in the line.
不做处理
: The original output of the OCR engine, with line breaks for each statement by default.
The above solutions can automatically handle horizontal and vertical (right to left) typesetting. (Vertical text also requires the support of the OCR engine itself)
Batch OCR : This page is used to batch import local images for recognition.
Supported formats: jpg, jpe, jpeg, jfif, png, webp, bmp, tif, tiff
.
Supported formats for saving recognition results: txt, jsonl, md, csv(Excel)
.
Like screenshot OCR, it supports文本后处理
function to organize the layout and order of OCR text.
There is no upper limit on the number, and hundreds of images can be imported at one time for tasks.
Supports automatic shutdown/standby after task completion.
If you want to recognize long or large images with oversized pixels, please adjust: page settings → text recognition → limit image side length → [Increase value] .
Has special functionality忽略区域
.
About OCR text post-processing - Ignore areas : A special function in batch OCR, suitable for excluding unwanted text in pictures.
The ignore area editor can be entered in the right column settings of the batch identification page.
As in the example above, there are multiple watermarks/LOGOs at the top and lower right corner of the image. If such images are recognized in batches, watermarks will interfere with the recognition results.
Hold down the right button and draw multiple rectangular boxes. Text within these areas will be ignored in the mission.
Please try to make the rectangular frame as large as possible to completely cover all possible locations of the watermark.
Note that only the entire block of text within the ignore region box (not individual characters) will be ignored. As shown in the figure below, the dark rectangle with a yellow border is an ignored area. Then only key_mouse
will be ignored. The two text blocks pubsub_connector.py
and pubsub_service.py
are retained.
Document identification :
Supported formats: pdf, xps, epub, mobi, fb2, cbz
.
Perform OCR on scanned documents or extract original text. Can be exported as a two-layer searchable PDF .
Supports setting ignore areas , which can be used to exclude header and footer text.
Can be set to automatically shut down/hibernate after the task is completed.
Scan code :
Take a screenshot/paste/drag into a local image and read the QR code and barcode in it.
Supports multiple codes for one picture.
Supports 19 protocols, as follows:
Aztec
, Codabar
, Code128
, Code39
, Code93
, DataBar
, DataBarExpanded
, DataMatrix
, EAN13
, EAN8
, ITF
, LinearCodes
, MatrixCodes
, MaxiCode
, MicroQRCode
, PDF417
, QRCode
, UPCA
, UPCE
Generate code :
Enter text and generate a QR code image.
Supports 19 protocols and error correction levels and other parameters.
Global settings : Here you can adjust the global parameters of the software. Commonly used functions are as follows:
Add shortcuts or set auto-start on boot with one click.
Change interface language . Umi supports traditional Chinese, English, Japanese and other languages.
Switch interface theme . Umi has multiple light/dark themes.
Adjust the size and font of interface text .
Switch OCR plug-in.
Renderer : The software interface supports graphics card accelerated rendering by default. If the screenshots flicker and the UI is misaligned on your machine, please adjust界面和外观
→渲染器
, try switching to a different rendering scheme, or turn off hardware acceleration.
Command line manual
HTTP interface manual
Thanks to the following translators who contributed localization translation work to Umi-OCR: (listed in no particular order)
translator | Contribution language |
---|---|
Bob | English, Traditional Chinese, Japanese |
Qingzheng Gao | English, Traditional Chinese |
Weng, Chia-Ling | English, Traditional Chinese |
linzow | English, Traditional Chinese |
Eric Guo | English |
steven0081 | English |
Marcos i | English |
plum7x | Traditional Chinese |
hugoalh | Traditional Chinese |
ドコモ光 | Japanese |
Yang Peng | Português |
If there are incorrect information or missing personnel, please reply in this discussion.
This project uses the online platform Weblate: Umi-OCR for localization translation collaboration. We welcome any user to participate in the translation work, you can proofread, supplement existing languages, or add new languages.
Main warehouse?
Plug-in library
Windows runtime
Linux runtime
The **
suffix indicates the content contained in this warehouse (主仓库
).
Umi-OCR ├─ Umi-OCR.exe ├─ umi-ocr.sh └─ UmiOCR-data ├─ main.py ** ├─ version.py ** ├─ qt_res ** │ └─ 项目qt资源,包括图标和qml源码 ├─ py_src ** │ └─ 项目python源码 ├─ plugins │ └─ 插件 └─ i18n ** └─ 翻译文件
Supported offline OCR engines:
PaddleOCR-json
RapidOCR-json
Running environment framework:
PyStand customized version
Please refer to the instructions at the beginning of the changelog.
Please jump to the following warehouse to complete the development/operation environment deployment of the corresponding platform.
Windows
Linux
The Umi-OCR project is mainly developed and maintained by the author hiroi-sora in his spare time. If you like this software, please sponsor it.
Domestic users can sponsor authors through iPower.
Tab frame.
OCR API controller.
OCR task controller.
Theme manager supports switching light/dark themes.
Implement batch OCR .
Implement screenshot OCR .
Shortcut key mechanism.
System tray menu.
Text block post-processing (typesetting optimization).
Engine memory cleanup.
The software interface is available in multiple languages.
Command line mode.
Win7 compatible.
Excel (csv) output format.
Esc
interrupts screenshot operation
External theme files
Font switching
loading animation
Ignore the area.
QR code recognition.
The picture preview window of the batch recognition page.
PDF recognition.
Call the local image browser to open the image. #335
Repeat the last screenshot. #357
Bug fix: document recognition compatibility issue in Windows 7 system.
HTTP/command line interface adds QR code recognition/generation function. (#423)
Documentation of QR code interface.
Linux platform porting.
HTTP document recognition interface.
These are expected functions. Interfaces have been reserved in the early stages of development and will be slowly implemented in the long term.
However, due to actual conditions during development, the functional design may be changed, and functions may be added or canceled.
Refactor the underlying plug-in mechanism.
Online OCR API plug-in.
Independent mathematical formula recognition plug-in.
The "Mathematical Formula" tab provides independent mathematical formula recognition/Latex rendering.
Check the update mechanism.
Text post-processing modules other than typesetting analysis (such as preserving numbers, half-width character conversion, text error correction).
Key interface functions add event triggering methods.
GPU-based offline OCR.
Picture translation
Offline translation.
Fixed area recognition.
Recognize table images and output them to Excel.
History recording system.
Compatible with MacOS / Ubuntu and other platforms.