Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2007.14-B.6.453

Extracting curved text lines using the chain composition and the expanded grouping method  

Bai, Nguyen Noi (충북대학교 전기전자컴퓨터공학부)
Yoon, Jin-Seon (충북대학교 BK21충북정보기술사업단)
Song, Young-Jun (충북대학교 충북 BIT연구중심대학육성사업단)
Kim, Nam (충북대학교 전기전자컴퓨터공학부)
Kim, Yong-Gi (충북대학교 천문우주학과)
Abstract
In this paper, we present a method to extract the text lines in poorly structured documents. The text lines may have different orientations, considerably curved shapes, and there are possibly a few wide inter-word gaps in a text line. Those text lines can be found in posters, blocks of addresses, artistic documents. Our method based on the traditional perceptual grouping but we develop novel solutions to overcome the problems of insufficient seed points and vaned orientations un a single line. In this paper, we assume that text lines contained tone connected components, in which each connected components is a set of black pixels within a letter, or some touched letters. In our scheme, the connected components closer than an iteratively incremented threshold will make together a chain. Elongate chains are identified as the seed chains of lines. Then the seed chains are extended to the left and the right regarding the local orientations. The local orientations will be reevaluated at each side of the chains when it is extended. By this process, all text lines are finally constructed. The proposed method is good for extraction of the considerably curved text lines from logos and slogans in our experiment; 98% and 94% for the straight-line extraction and the curved-line extraction, respectively.
Keywords
Document Image Analysis; Document Image Segmentation; Text Lines Extraction; Curved Text Lines Extraction;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Shapiro V., Gluhchev G., Sgurev V., 'Handwritten document image segmentation and analysis', Pattern Recognition Letters, Vol.14, No.1, pp. 71-78, 1993   DOI   ScienceOn
2 Douglas J. Kennard, William A Barrett, 'Separating lines of text in free-form handwritten historical documents', Second International Conference on Document Image Analysis for Libraries (DIAL'06), pp. 12-23, 2006   DOI
3 H. Yan, 'Detection of curved text path based on the Fuzzy Curve-Tracing (FCT) algorithm', in Proc. 6th Int. Conf. Document Analysis Recognition, pp. 266-269, 2001   DOI
4 Likforman-Sulem L., Faure c., 'Extracting lines on handwritten documents by perceptual grouping', Advances in Handwiting and drawing: a Multidisciplinary Approach, C. Faure, P. Keuss, G. Lorette, A Winter (Eds), pp. 21-38, Europia, Paris, 1994
5 Laurence Likforman-Sulem, Abderrazak Zahour, Bruno Taconet, 'Text line segmentation of historical documents: a survey', Special Issue on Analysis of Historical Documents, International Journal of Document Analysis and Recognition, Springer, Vol. 9, No. 2-4, pp. 123-138, 2007   DOI   ScienceOn
6 Feldbach M., Tiinnies K.D., 'Line detection and segmentation in historical church registers', Proc. of ICDAR'01, Seattle, pp. 743-747, 2001   DOI
7 Pu Y, Shi Z., 'A natural learning algorithm based on Hough transform for text lines extraction in handwritten documents', In Proceedings of the 6 IntI. Workshop on Frontiers in Handwriting Recognition, Taejon, Korea, pp. 637- 646, 1998
8 Zahour, A, Taconet, B., Mercy, P., Ramdane, S., 'Arabic hand-written text-line extraction', Proceedings of the 6th ICDAR, Seattle, pp. 281 - 285, 2001   DOI
9 Wong K., R. Casey, F. Wahl, 'Document analysis systems', IBM Journal of Research and Development, Vol. 26, No.6, 1982
10 U. Pal and Partha Pratim Roy, 'Multioriented and curved text lines extraction from Indian documents', IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics, Vol. 34, No.4, August 2004   DOI   ScienceOn
11 H.Goto and H. Aso, 'Extracting curved lines using local linearity of the text line', Int. J. Doc. Anal. Recognit., Vol. 2, pp. 111 - 118, 1999   DOI