Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2006.13B.6.591

Word Image Decomposition from Image Regions in Document Images using Statistical Analyses  

Jeong, Chang-Bu (호남대학교 인터넷소스트웨어학과)
Kim, Soo-Hyung (전남대학교 전자컴퓨터공학부)
Abstract
This paper describes the development and implementation of a algorithm to decompose word images from image regions mixed text/graphics in document images using statistical analyses. To decompose word images from image regions, the character components need to be separated from graphic components. For this process, we propose a method to separate them with an analysis of box-plot using a statistics of structural components. An accuracy of this method is not sensitive to the changes of images because the criterion of separation is defined by the statistics of components. And then the character regions are determined by analyzing a local crowdedness of the separated character components. finally, we devide the character regions into text lines and word images using projection profile analysis, gap clustering, special symbol detection, etc. The proposed system could reduce the influence resulted from the changes of images because it uses the criterion based on the statistics of image regions. Also, we made an experiment with the proposed method in document image processing system for keyword spotting and showed the necessity of studying for the proposed method.
Keywords
Word Image Decomposition; Document Image; Image Region;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 H.C. Park, S.Y. Ok, Y.J. Yu, and H.G. Cho, 'A word extraction algorithm for machine-printed documents using a 3D neighborhood graph model,' International Journal of Document Analysis and Recognition, Vol. 4, pp. 115-130, 2001   DOI
2 심정욱, 손영숙, 백장선, 수리통계학, 자유아카데미, 제4판, 2003
3 정창부, 김수형, '투영 프로파일, Gap 및 특수 기호를 이용한 텍스트 영역의 이절 단위 분할,' 정보과학회논문지: 소프트웨어 및 응용, 제31권, 제9호, pp. 1121-1130, 2004   과학기술학회마을
4 C.L. Tan and P.O. Ng, 'Text Extraction using Pyramid,' Pattern Recognition, Vol. 31, No.1, pp. 63-72, 1998.   DOI   ScienceOn
5 김석태, 이대원, 박찬용, 남궁재찬, '연결특성함수를 이용한 문서화상에서의 영역 분리와 문자열 추출,' 한국통신학회 논문지, Vol. 22, No. 11, pp. 2531-2542, 1997   과학기술학회마을
6 Z. Lu, 'Detection of Text Regions From Digital Engineering Drawings,' IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No.4, pp. 910-918, April 1998   DOI   ScienceOn
7 K Tombre, S. Tabbone, L. Pelissier, B. Larniroy, and P. Dosch, 'Text/Graphics Separation Revisited,' LNCS Vol. 2423, pp. 200-211, 2002
8 O. Shiku, K. Kawasue, and A. Nakamura, 'A Method for Character String Extraction Using Local and Global Segment Crowdedness,' Proc. International Conference on Pattern Recognition, Vol. 2, pp. 1077-1080, 1998   DOI
9 LA Fletcher and R. Kasturi, 'A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images,' IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 10, No.6, pp. 910-918, 1988   DOI   ScienceOn