DOI QR코드

DOI QR Code

Hangeul detection method based on histogram and character structure in natural image

다양한 배경에서 히스토그램과 한글의 구조적 특징을 이용한 문자 검출 방법

  • 표성국 (광운대학교 플라즈마 바이오 디스플레이학과) ;
  • 박영수 (광운대학교 인제니움학부대학) ;
  • 이강성 (광운대학교 인제니움학부대학) ;
  • 이상훈 (광운대학교 인제니움학부대학)
  • Received : 2019.01.11
  • Accepted : 2019.03.20
  • Published : 2019.03.28

Abstract

In this paper, we proposed a Hangeul detection method using structural features of histogram, consonant, and vowel to solve the problem of Hangul which is separated and detected consonant and vowel The proposed method removes background by using DoG (Difference of Gaussian) to remove unnecessary noise in Hangul detection process. In the image with the background removed, we converted it to a binarized image using a cumulative histogram. Then, the horizontal position histogram was used to find the position of the character string, and character combination was performed using the vertical histogram in the found character image. However, words with a consonant vowel such as '가', '라' and '귀' are combined using a structural characteristic of characters because they are difficult to combine into one character. In this experiment, an image composed of alphabets with various backgrounds, an image composed of Korean characters, and an image mixed with alphabets and Hangul were tested. The detection rate of the proposed method is about 2% lower than that of the K-means and MSER character detection method, but it is about 5% higher than that of the character detection method including Hangul.

본 논문에서는 자음과 모음이 분리되어 검출되는 한글의 문제점을 해결하기 위해 히스토그램과 자음, 모음 문자의 구조적 특징을 이용한 한글 검출 방법을 제안하였다. 제안하는 방법은 한글 검출 과정에서 불필요한 잡음을 제거하기 위해 DoG(Difference of Gaussian)을 이용하여 배경을 제거하였다. 배경이 제거된 이미지에서 누적 히스토그램을 사용하여 위해 이진화 이미지로 변환하였다. 그 후 수평 누적 히스토그램을 사용하여 문자열 위치를 찾고, 찾은 문자열 이미지에서 수직히스토그램을 사용하여 문자 결합을 진행하였다. 하지만 '가', '라' '귀' 와 같이 자음 모음이 수평으로 존재하는 단어는 하나의 문자로 결합이 어렵기 때문에 문자의 구조적 특징을 이용하여 결합하였다. 본 실험에서는 다양한 배경을 가진 알파벳으로 구성된 이미지, 한글로 구성된 이미지, 알파벳과 한글이 혼합된 이미지를 가지고 실험하였다. 제안하는 방법은 K-means와 MSER 문자 검출 방법이랑 비교했을 때 알파벳 검출률은 2%정도 낮지만 한글이 포함된 문자 검출 방면에서는 90.6%로 약 5% 높은 검출률을 보였다.

Keywords

OHHGBW_2019_v10n3_15_f0001.png 이미지

Fig. 1. Difference between alphabet and Hangul detection (a) Alphabet detection (b) Hangul detection

OHHGBW_2019_v10n3_15_f0002.png 이미지

Fig. 2 The six structures of Korean characters IC and FC are the initial consonant and final consonant, respectively. VV and HV mean the vertical vowel and horizontal vowel, respectively

OHHGBW_2019_v10n3_15_f0003.png 이미지

Fig. 3. The algorithm of the proposed method

OHHGBW_2019_v10n3_15_f0004.png 이미지

Fig. 4. An example of obtaining a binary image using DoG filtered results for a Natural image (a) Original image (b) DoG filter applied image (c) Binarization applied image

OHHGBW_2019_v10n3_15_f0005.png 이미지

Fig. 5. Horizontal / Vertical Cumulative Histogram (a) Binarization image (b) Horizontal histogram labeling image (c) Vertical histogram labeling image

OHHGBW_2019_v10n3_15_f0006.png 이미지

Fig. 6. Vertical histogram labeling process (a) Original image (b) Vertical histogram labeling image (c) Vertical histogram labeling result image

OHHGBW_2019_v10n3_15_f0007.png 이미지

Fig. 7. Consonant, vowel combination method

OHHGBW_2019_v10n3_15_f0008.png 이미지

Fig. 8. Consonant, vowel combination process (a) Vertical histogram labeling result (b) Consonant and vowel centering

OHHGBW_2019_v10n3_15_f0009.png 이미지

Fig. 9. Experimental image

OHHGBW_2019_v10n3_15_f0010.png 이미지

Fig. 10. Image of experiment result ① (a) Original image (b) MSER detection result (c) Cumulative histogram detection result (d) Proposed method

OHHGBW_2019_v10n3_15_f0011.png 이미지

Fig. 11. Image of experiment result ② (a) Original image (b) MSER detection result (c) Cumulative histogram detection result (d) Proposed method

OHHGBW_2019_v10n3_15_f0012.png 이미지

Fig. 13. Comparison of experiment results ⓛ (a) Original image (b) K-means detection (c) MSER detection (d) Proposed method

OHHGBW_2019_v10n3_15_f0013.png 이미지

Fig. 13. Comparison of experiment results ② (a) Original image (b) K-means detection (c) MSER detection (d) Proposed method

OHHGBW_2019_v10n3_15_f0014.png 이미지

Fig. 15. Comparison of experiment results ③ (a) Original image (b) K-means detection (c) MSER detection (d) Proposed method

OHHGBW_2019_v10n3_15_f0015.png 이미지

Fig. 16. Comparison of experiment results ④ (a) Original image (b) K-means detection (c) MSER detection (d) Proposed method

OHHGBW_2019_v10n3_15_f0016.png 이미지

Fig. 17. Detection failure image ⑤

OHHGBW_2019_v10n3_15_f0017.png 이미지

Fig. 12. Image of experiment result ③ (a) Original image (b) MSER detection result (c) Cumulative histogram detection result (d) Proposed method

Table 1. Hangeul detection comparison table

OHHGBW_2019_v10n3_15_t0001.png 이미지

References

  1. Xiao Qin, Xutao Chu, Changan Yuan & Ruili Wang (2018). Entropy-based feature extraction algorithm for stone carving character detection The Journal of Engineering 16(11), 1719-1723. DOI: 10.1049/joe.2018.8318
  2. Chong Yu, Yonghong Song, Quan Meng Yuanlin Zhang & Yang Liu (2015) Text detection and recognition in natural scene with edge analysis. IET Computer Vision 9(4), 603-613. DOI: 10.1049/iet-cvi.2013.0307
  3. Parul Sahare & Sanjay B. Dhok (2018). Multilingual Character Segmentation and Recognition Schemes for Indian Document Images, IEEE Access 6(1), 10603-10617. DOI: 10.1109/ACCESS.2018.2795104
  4. Lukas Neumann & Jiri Matas (2016). Real-Time Lexicon-Free Scene Text Localization and Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(9), 1872-1885. DOI: 10.1109/TPAMI.2015.2496234
  5. E.J. Bellegarda, J.R. Bellegarda, D. Nahamoo & K.S. Nathan (1994). A fast statistical mixture algorithm for on-line handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(12), 1227-1233. DOI: 10.1109/34.387484
  6. Ga-On Kim, Gang-Seong Lee & Sang-Hun Lee (2014). An Edge Extraction Method Using K-means Clustering In Image .Journal of Digital Convergence, 12(11), 281-288, 1738-1916 https://doi.org/10.14400/JDC.2014.12.11.281
  7. Oussama Zayene, Mathias Seuret & Sameh M. Touj (2016, Apr). Text Detection in Arabic News Video Based on SWT Operator and Convolutional Auto-Encoders. 2016 12th IAPR Workshop on Document Analysis Systems (DAS), (pp.13-18) Greece : CPS
  8. Yuanyuan Feng, Yonghong Song & Yuanlin Zhang (2016, Dec). Scene text detection based on multi-scale SWT and edge filtering. 2016 23rd International Conference on Pattern Recognition (ICPR), (pp.645-650). Mexico : IEEE
  9. Adiba Tabassum & Shweta A. Dhondse. (2015, Apr). Text Detection Using MSER and Stroke Width Transform. 2015 Fifth International Conference on Communication Systems and Network Technologies, (pp.568-571). (pp.4-6) India : IEEE
  10. Savita Choudhary, Nikhil Kumar Singh & Sanjay Chichadwani. (2018,Feb). Text Detection and Recognition from Scene Images using MSER and CNN. 2018 Second International Conference on Advances in Electronics, Computers and Communications (ICAECC). (pp.1-4), India : IEEE
  11. Myoung-Kwan Oh & Jong-Cheon Park. (2017). Long Distance Vehicle License Plate Region Detection Using Low Resolution Feature of License Plate Region in Road View Images. Journal of Digital Convergence, 15(1), 239-245, 1738-1916. https://doi.org/10.14400/JDC.2017.15.1.239
  12. Jinsu Jo, Jihyun Lee & Yillbyung Lee. (2009, Nov). Stroke-Based Online Hangul/Korean Character Recognition. 2009 Chinese Conference on Pattern Recognition, (pp.1-5), China : IEEE
  13. Kyung-Won Kang & J.H. Kim. (2004). Utilization of hierarchical, stochastic relationship modeling for Hangul character recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(9), 1185-1196. DOI: 10.1109/TPAMI.2004.74
  14. Soo-Chang Pei & Li-Heng Chen. (2015). Image Quality Assessment Using Human Visual DOG Model Fused With Random Forest. IEEE Transactions on Image Processing 24(11), 3282-3292. DOI: 10.1109/TIP.2015.2440172
  15. Jangho Kim, Yong-Joong Kim, Yonghyun Kim & Daijin Kim. (2016, Oct). Detecting Korean characters in natural scenes by alphabet detection and agglomerative character construction. 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), (pp.9-12). Hungary : IEEE
  16. Sung Hoon Kim, Hyung Ho Kim & Hyon Soo Lee. (2013). An Improved Face Recognition Method Using SIFT-Grid. Journal of Digital Convergence, 11(2), 229-307, 1738-1916 https://doi.org/10.14400/JDPM.2013.11.12.229
  17. R.A. Melnyk & Yu.I. Kalychak. Detection of Defects in Printed Circuit Boards by Flood-Fill Algorithm and Distributed Cumulative Histogram.(2018, Sept). 2018 IEEE 13th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT). Ukraine : IEEE
  18. D. Karatzas, F. Shafait, S. Uchida, M. Iwamura, L. G. i. Bigorda, S. R. Mestre, J. Mas, D. F. Mota, J. A. Almaz'an, & L. P. de las Heras, Icdar 2013 robust reading competition.(2013, Aug) 2013 12th International Conference on Document Analysis and Recognition, (pp.1484-1493). USA : CPS