Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2006.13B.2.089

Character Segmentation on Printed Korean Document Images Using a Simplification of Projection Profiles  

Park Sang-Cheol (전남대학교 정보통신연구소)
Kim Soo-Hyung (전남대학교 전자컴퓨터정보통신공학부)
Abstract
In this paper, we propose two approaches for the character segmentation on Korean document images. One is an improved version of a projection profile-based algorithm. It involves estimating the number of characters, obtaining the split points and then searching for each character's boundary, and selecting the best segmentation result. The other is developed for low quality document images where adjacent characters are connected. In this case, parts of the projection profile are cut to resolve the connection between the characters. This is called ${\alpha}$-cut. Afterwards, the revised former segmentation procedure is conducted. The two approaches have been tested with 43,572 low-quality Korean word images punted in various font styles. The segmentation accuracies of the former and the latter are 91.81% and 99.57%, respectively. This result shows that the proposed algorithm using a ${\alpha}$-cut is effective for low-quality Korean document images.
Keywords
Character Segmentation; Projection Profile; Printed Korean Word Image; Keyword Spotting; OCR;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 C. B. Jeong and S. H. Kim, 'A document image preprocessing system for keyword spotting,' Proc. International Conference on Asian Digital Libraries, China, pp.440-443, Dec., 2004
2 N. W. Strathy, C. Y. Suen, and A. Krzyzak, 'Segmentation of handwritten digits using contour features,' Proc. 2nd International Conference on Document Analysis and Recognition, pp.577-580, Oct., 1993   DOI
3 M. C. Jung, Y. C. Shin, and S. N. Srihari, 'Machine printed character segmentation method using side profiles,' Proc. IEEE International Conference on Systems, Man, Cybernetics, Vol.6, pp.863-867, 1999   DOI
4 H. H. Kuo and J. F. Wang, 'A new method for the segmentation of mixed handprinted Chinese/English characters,' Proc. 2nd International Conference on Document Analysis and Recognition, pp.810-813, Oct., 1993   DOI
5 I. S. Oh, Y. S. Choi, J. H. Yang, and S. H. Kim, 'A keyword spotting system of Korean document images,' Proc. 5th International Conference on Asian Digital Libraries, Singapore, p.530, Dec., 2002
6 김광백, 김영주, '다해상도 영상과 개선된 RBF 네트워크를 이용한 계층적 영문 명함 인식' 정보처리학회논문지B, Vol. 10, No.4, pp.443-450, 2003   과학기술학회마을   DOI
7 Y. Lu, B. Haist, L. Harmon, J. Trenkle, and R. Vogt, 'An accurate and efficient system for segmenting machineprinted text,' Postal Service 5th Advanced Technology Conference, Washington D. C, Nov., Vo1.3, pp.A-93 to A -105, 1992
8 S. Liang, M. Ahmadi, and M. Shridhard, 'Segmentation of touching characters in printed document recognition,' Proc. 2nd International Conference on Document Analysis and Recognition, pp.569-572, Oct., 1993   DOI
9 이근수, '퍼지 추론을 이용한 인쇄체 한글 인식' 숭실대학교 전자계산학과 박사학위논문, 1993
10 구건서, '비디오 영상 정보 검색을 위한 문자 추출 및 인식' 컴퓨터산업교육기술학회논문지, Vol.3, No.7, pp.901-914, 2002   과학기술학회마을
11 C. L. Tan, W. Huang, Z. Yu, and Y. Xu, 'Image document text retrieval without OCR,' IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol.24, No.6, pp.838-844, July, 2002   DOI   ScienceOn
12 Y. Lu, 'Machine printed character segmentation-An overview,' Pattern Recognition, Vol.28, No.1, pp.67-80, 1995   DOI   ScienceOn
13 R. G. Casey and E. Lecolinet, 'A survey of methods and strategies in character segmentation,' IEEE Transaction on Pattern Analysis and Machine Intelligence, Vo1.18, No.7, pp. 690-706, July, 1996   DOI   ScienceOn
14 김우성, 이기돈, 문승원, 유신재, 최명구, 김민수, '오프라인 인쇄체 문자 인식기 개발' 한국과학기술정보연구원, 1997년 12월
15 J. DeCurtins and E. Chen, 'Keyword spotting via word shape recognition,' Proc. SPIE Document Recognition II, pp. 270-277, 1995   DOI
16 Y. Lu and C.L. Tan, 'Chinese word searching in imaged documents,' International Journal of Pattern Recognition and Artificial Intelligence, Vol.18, No.2, pp.229- 246, 2004   DOI   ScienceOn
17 김혜금, 양진호, 이진선, 오일석, '웨이브렛을 이용한 영상기반 인쇄 한글 단어 검색' 한국정보과학회 논문지, 제28권 제2호, pp.91-103, 2001   과학기술학회마을
18 Y. Lu, and C. L. Tan, 'Word searching in document images using word portion matching,' 5th IAPR International Workshop on Document Analysis Systems, USA, pp. 319-328, 2002
19 Y. Lu, L. Zhang, and C. L. Tan, 'A search engine for imaged documents in PDF files,' 27th Annual International ACM SIGIR Conference, UK, pp.536-537, 2004   DOI
20 J. DeCurtins and E. Chen, 'Keyword spotting via word shape recognition,' Proc. SPIE Document Recognition II, pp. 270-277, 1995   DOI
21 D. Doermann, 'The retrieval of document images: a brief survey,' Proc, ICDAR 97, Ulm, pp.945-949, 1997   DOI
22 R. G. Casey and G. Nagy, 'Recursive segmentation and classification of composite character patterns,' 6th International Joint Conference on Pattern Recognition, pp. 1023-1026, 1982