Umi OCR Download - Umi OCR Source code download

Umi OCR

Other source code

v2.1.4

Download

Chinese • English • Japanese

QQ浏览器截图20241118145224.png

Umi-OCR text recognition tool

Instructions for use • Download address • Update log • Report a bug

Free, open source, batch-capable offline OCR software
_{Applicable to Windows7 x64, Linux x64}

Free : All code in this project is open source and completely free.
Convenient : Unzip and use, run offline, no network required.
Efficient : It comes with a highly efficient offline OCR engine and built-in multiple language recognition libraries.
Flexible : Supports external calling methods such as command line and HTTP interface.
Functions : Screenshot OCR / Batch OCR / PDF recognition / QR code / formula recognition

Use source code

Developers please be sure to read Building Projects.

Download the distribution

The following release links are maintained for a long time and provide downloads of stable versions.

Lanzoul Cloud https://hiroi-sora.lanzoul.com/s/umi-ocr (domestic recommendation, no registration/unlimited speed)
GitHub https://github.com/hiroi-sora/Umi-OCR/releases/latest
Source Forge https://sourceforge.net/projects/umi-ocr

• Scoop Installer (click to expand)

Scoop is a command line installation program under Windows that can easily manage multiple applications. You can install Scoop first, and then use the following instructions to install Umi-OCR :

Add extras bucket:

scoop bucket add extras

(Optional 1) Install Umi-OCR (comes with Rapid-OCR engine, good compatibility):

scoop install extras/umi-ocr

(Optional 2) Install Umi-OCR (comes with Paddle-OCR engine, slightly faster):

scoop install extras/umi-ocr-paddle

Do not install both at the same time, the shortcuts may be overwritten. But you can import additional plug-ins and switch to different OCR engines at any time.

Get started

The software release package is downloaded as .7z compressed package or a .7z.exe self-extracting package. Self-extracting packages can decompress files on computers that do not have compression software installed.

This software does not require installation. After decompression, click Umi-OCR.exe to start the program.

If you encounter any problems, please submit an Issue and I will try my best to help you.

interface language

The interface supported by Umi-OCR is in multiple languages. When you open the software for the first time, the language will be automatically switched according to your computer's system settings.

If you need to switch the language manually, please refer to the figure below,全局设置→语言/Language .

tab page

Umi-OCR v2 consists of a series of flexible and easy-to-use tabs . You can open the required tabs according to your preference.

You can switch the window to the top in the upper left corner of the tab bar. The upper right corner can lock the tab to prevent accidentally closing the tab during daily use.

Screenshot OCR

Screenshot OCR : After opening this page, you can use shortcut keys to evoke the screenshot and recognize the text in the picture.

In the picture preview bar on the left, you can directly select and copy with the mouse.
In the identification record column on the right, text can be edited and multiple records can be selected and copied.
It also supports copying images elsewhere and pasting them into Umi-OCR for recognition.
About the formula recognition function

Text post-processing

About OCR text post-processing - typesetting analysis solution : The typesetting and order of OCR results can be organized to make the text more suitable for reading and use. Default plan:

多栏-按自然段换行: suitable for most scenarios, automatically recognizes multi-column layout and wraps lines according to natural paragraph rules.
多栏-总是换行: Each statement is wrapped.
多栏-无换行: Forces all statements to be merged into the same line.
单栏-按自然段换行/总是换行/无换行: similar to the above, but does not distinguish between multi-column layouts.
单栏-保留缩进: suitable for parsing code screenshots, retaining the indentation at the beginning of the line and the spaces in the line.
不做处理: The original output of the OCR engine, with line breaks for each statement by default.

The above solutions can automatically handle horizontal and vertical (right to left) typesetting. (Vertical text also requires the support of the OCR engine itself)

Batch OCR

Batch OCR : This page is used to batch import local images for recognition.

Supported formats: jpg, jpe, jpeg, jfif, png, webp, bmp, tif, tiff .
Supported formats for saving recognition results: txt, jsonl, md, csv(Excel) .
Like screenshot OCR, it supports文本后处理function to organize the layout and order of OCR text.
There is no upper limit on the number, and hundreds of images can be imported at one time for tasks.
Supports automatic shutdown/standby after task completion.
If you want to recognize long or large images with oversized pixels, please adjust: page settings → text recognition → limit image side length → [Increase value] .
Has special functionality忽略区域.

ignore area

About OCR text post-processing - Ignore areas : A special function in batch OCR, suitable for excluding unwanted text in pictures.

The ignore area editor can be entered in the right column settings of the batch identification page.
As in the example above, there are multiple watermarks/LOGOs at the top and lower right corner of the image. If such images are recognized in batches, watermarks will interfere with the recognition results.
Hold down the right button and draw multiple rectangular boxes. Text within these areas will be ignored in the mission.
Please try to make the rectangular frame as large as possible to completely cover all possible locations of the watermark.
Note that only the entire block of text within the ignore region box (not individual characters) will be ignored. As shown in the figure below, the dark rectangle with a yellow border is an ignored area. Then only key_mouse will be ignored. The two text blocks pubsub_connector.py and pubsub_service.py are retained.

68747470733a2f2f74757069616e2e6c692f696d616765732f323032342f30352f33302f363635383762663033616531352e706e67.png

Document recognition

Document identification :

Supported formats: pdf, xps, epub, mobi, fb2, cbz .
Perform OCR on scanned documents or extract original text. Can be exported as a two-layer searchable PDF .
Supports setting ignore areas , which can be used to exclude header and footer text.
Can be set to automatically shut down/hibernate after the task is completed.

QR code

Scan code :

Take a screenshot/paste/drag into a local image and read the QR code and barcode in it.
Supports multiple codes for one picture.
Supports 19 protocols, as follows:

Aztec , Codabar , Code128 , Code39 , Code93 , DataBar , DataBarExpanded , DataMatrix , EAN13 , EAN8 , ITF , LinearCodes , MatrixCodes , MaxiCode , MicroQRCode , PDF417 , QRCode , UPCA , UPCE

Generate code :

Enter text and generate a QR code image.
Supports 19 protocols and error correction levels and other parameters.

Global settings

Global settings : Here you can adjust the global parameters of the software. Commonly used functions are as follows:

Add shortcuts or set auto-start on boot with one click.
Change interface language . Umi supports traditional Chinese, English, Japanese and other languages.
Switch interface theme . Umi has multiple light/dark themes.
Adjust the size and font of interface text .
Switch OCR plug-in.
Renderer : The software interface supports graphics card accelerated rendering by default. If the screenshots flicker and the UI is misaligned on your machine, please adjust界面和外观→渲染器, try switching to a different rendering scheme, or turn off hardware acceleration.

Call interface:

Command line manual
HTTP interface manual

Software localization translation:

Thanks to the following translators who contributed localization translation work to Umi-OCR: (listed in no particular order)

translator	Contribution language
Bob	English, Traditional Chinese, Japanese
Qingzheng Gao	English, Traditional Chinese
Weng, Chia-Ling	English, Traditional Chinese
linzow	English, Traditional Chinese
Eric Guo	English
steven0081	English
Marcos i	English
plum7x	Traditional Chinese
hugoalh	Traditional Chinese
ドコモ光	Japanese
Yang Peng	Português

If there are incorrect information or missing personnel, please reply in this discussion.

This project uses the online platform Weblate: Umi-OCR for localization translation collaboration. We welcome any user to participate in the translation work, you can proofread, supplement existing languages, or add new languages.

About project structure

Each warehouse:

Main warehouse?
Plug-in library
Windows runtime
Linux runtime

Engineering structure:

The ** suffix indicates the content contained in this warehouse (主仓库).

Umi-OCR
├─ Umi-OCR.exe
├─ umi-ocr.sh
└─ UmiOCR-data
   ├─ main.py **
   ├─ version.py **
   ├─ qt_res **
   │  └─ 项目qt资源，包括图标和qml源码
   ├─ py_src **
   │  └─ 项目python源码
   ├─ plugins
   │  └─ 插件
   └─ i18n **
      └─ 翻译文件

Supported offline OCR engines:

PaddleOCR-json
RapidOCR-json

Running environment framework:

PyStand customized version

Build project

Step 0: (Optional) Fork this project

Step 1: Download the code

Please refer to the instructions at the beginning of the changelog.

Next steps:

Please jump to the following warehouse to complete the development/operation environment deployment of the corresponding platform.

Windows
Linux

sponsor

The Umi-OCR project is mainly developed and maintained by the author hiroi-sora in his spare time. If you like this software, please sponsor it.

Domestic users can sponsor authors through iPower.

Star History

Change log

development plan

completed work

Tab frame.
OCR API controller.
OCR task controller.
Theme manager supports switching light/dark themes.
Implement batch OCR .
Implement screenshot OCR .
Shortcut key mechanism.
System tray menu.
Text block post-processing (typesetting optimization).
Engine memory cleanup.
The software interface is available in multiple languages.
Command line mode.
Win7 compatible.
Excel (csv) output format.
Esc interrupts screenshot operation
External theme files
Font switching
loading animation
Ignore the area.
QR code recognition.
The picture preview window of the batch recognition page.
PDF recognition.
Call the local image browser to open the image. #335
Repeat the last screenshot. #357
Bug fix: document recognition compatibility issue in Windows 7 system.
HTTP/command line interface adds QR code recognition/generation function. (#423)
Documentation for the QR code interface.
Linux platform porting.
HTTP document recognition interface.

forward planning

Expand

These are expected functions. Interfaces have been reserved in the early stages of development and will be slowly implemented in the long term.

However, due to actual conditions during development, the functional design may be changed, and functions may be added or canceled.

Refactor the underlying plug-in mechanism.
Online OCR API plug-in.
Independent mathematical formula recognition plug-in.
The "Mathematical Formula" tab provides independent mathematical formula recognition/Latex rendering.
Check the update mechanism.
Text post-processing modules other than typesetting analysis (such as preserving numbers, half-width character conversion, text error correction).
Key interface functions add event triggering methods.
GPU-based offline OCR.
Picture translation
Offline translation.
Fixed area recognition.
Recognize table images and output them to Excel.
History recording system.
Compatible with MacOS / Ubuntu and other platforms.