Korean Handwriting Recognition AI Download - Korean Handwriting Recognition AI Source code download

Korean Handwriting Recognition AI

Other source code

Download

Korean Handwriting Recognition AI

Konkuk Univ. Senior, Multimedia Programming - Term project (Individual project)

1. Introduction

"Can an individual's Korean handwriting be learned through an artificial neural network?"

Even when writing the same letters, each person's handwriting style looks subtly different. The handwriting style of people who are forced to write or who write while belching will leave different traces than usual, making it highly reliable as evidence. This uniqueness is what makes a signature unique. In important documents and exams, it is very important to check the handwriting for authenticity.

As it is used to prove identity, handwriting is unique, but if there are two different handwritings with only subtle differences, it is difficult for humans to distinguish the difference with the naked eye. Therefore, I wanted to implement an artificial intelligence model for Korean handwriting recognition.

(1-1) Input data

Korean Handwriting from 10 people

letter letter letter letter letter letter letter letter letter letter

The picture above is all written by different person. (You can see that they are the same letters, but subtly different.)

(1-2) Target

10 people (3 family members, 7 close friends)
- BSN(Sona Bang), CHW(Howon Choi), KBJ(Beomjun Kim), KJH(Joonhyung Kwon), LJH(Jongho Lee), LSE(Seungeon Lee), PJH(Jonghyuk Park), PSM(Sangmoon Park), SHB(Suk Hyunbin), SWS(Woosub Shin)

(1-3) Expected application field

Handwriting verification of national examination and important documents.
OCR (Optical Character Recognition)

2. Creating database

2-1. One-letter data

total loss

(1) imread

letter

Used template provided from "Ongle-leap", a font design company
Included every combination of Korean letter

(2) Unsharped mask

letter

Used unsharped mask to sharpen one's handwriting

(3) Grayscale transform

letter

(4) Histogram examine & Apply 1st threshold

letter letter

The threshold is set through the Histogram, and the image is binarized based on the threshold.
(In this example, the threshold is set to 150 of 0 to 255)

(5) Apply LPF

letter

In order to extract the position of the handwriting, it is necessary to smooth the handwriting through LPF so that the contour is exposed.
Set the kernel size appropriately and apply LPF to binarized images through cv2.filter2D.
The smaller the Kernel size, the easier it is to detect smaller units such as vowels and consonants, and the larger the Kernel size, the easier it is to detect the contour of the letter itself.
(Example applies 21x21 kernel)

(6) Histogram examine & Apply 2nd threshold

letter letter

Set the threshold through the histogram of the picture smoothed with LPF, then binarization is performed once again based on the threshold.
(Example threshold value : 230)

(7) Extract contour & coordinates, image crop

letter

A small outline that is not a letter, was not extracted.
The x,y coordinates and w,h values were extracted from the extracted contour, and the coordinates were calculated again with a square so that the handwriting characteristics were not lost as much as possible because they had to be resized to 64x64 sizes later.
Through the calculated coordinates, the image was cropped into a square shape.

(8) Imwrite

letter

81 one-letter data per person
total : 810 one-letter data were collected,

2-2. Two-letter data

letter letter letter

The two-letter data was created by combining different single-letter data, each scaled appropriately.
In order to reduce the loss of feature information during resizing, the img_concat(img1, img2) function was created and used to connect the images, and forming them into squares.

Acquired 6,480 two-letter data per target (81P2)

2-3. Three-letter data

letter letter letter

Three-letter data was made by combining one-letter data and two-letter data, each scaled appropriately.

Acquired 7,980 three-letter data per target

2-4. Actual handwritten data from note-taking

letter letter letter letter

Data was acquired from the target's actual note taking.

Acquired 30 actual handwritten data per target

3. 1st Result

Train data : Test data = 9 : 1
Used data = One-letter(81) + Two-letter(500) + Three-letter(500) + Actuall handwriting(30)
n.Epoch = 20, Batch size = 50, Learning rate = 0.01
Layer configure

letter

Result
- Loss=0.7819, Accuracy=0.72
- Result for test data=0.6503

letter

4. 2nd Result

Train data : Test data = 9 : 1
Used data = One-letter(81) + Two-letter(1000) + Three-letter(1300) + Actuall handwriting(30)
n.Epoch = 20, Batch size = 150, Learning rate = 0.04
Layer configure

letter

Result
- Loss=0.0397, Accuracy=0.9908
- Result for test data=0.9360

letter

5. Result analysis

Compared to the first result the second result were all significantly improved. I think the reason why the results have improved is as follows.

(5-1) Increasing the data used for learning

In the first attempt, 1,111 data were used per target, and in the second attempt, 2,411 data were used, which is more than double.
Since learning and evaluation were conducted through more data, more delicate characteristics of the target's handwriting could have been caputured.

(5-2) Changes in the artificial neural network layer

In the second attempt, the size of the convolution mask and the size of the pooling were modified.
The size of the convolution mask was reduced to a smaller size than before to capture very fine handwriting characteristics
Reduced the loss of handwriting characteristics in the learning process by reducing the pooling size.
In addition, to contain more information about handwriting as parameters, the number of nodes in fully connected was increased from 128 to 256

(5-3) Increase of batch size and learning rate

Expand

Additional Information

Version
Type Other source code
Update Time 2025-01-07
size 27.97MB
From Github

Related Applications

krita ai diffusion

2024-11-03
In front of it

2024-07-08
AI Photo Enhancer

2023-07-14
AI Creator

2023-04-23
Jasper AI

2023-04-12
Alien AI

2022-07-29

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
waymo open dataset

Other source code

December 2023 Update
SmartTube

Other source code

24.71 Stable
Sunamu

Other source code

Release 2.2.0
waymo open dataset

Other source code

December 2023 Update
wp functions

Other categories

1.0.0
termwind

Other categories

v2.3.0

Related Information All