DOI QR코드

DOI QR Code

Minimum-cost Path Algorithm for Separating Touching English Characters

최단 경로 알고리즘을 이용한 접합 영문자 분할

  • Lee, Duk-Ryong (Division of Computer Science & Engineering, Chonbuk National Unversity) ;
  • Oh, Il-Seok (Division of Computer Science & Engineering, Chonbuk National Unversity)
  • Received : 2012.07.27
  • Published : 2012.10.25

Abstract

The paper proposes an algorithm which finds a nonlinear cut path for a printed grayscale touching character image. The conventional algorithms were observed to fail in situations of complicated touching. We analyzed those situations, and based on the analysis results we identified problematic issues of the conventional algorithms. We modified the conventional algorithms in two aspects. First we propose a new penalizing term which is probable to guide correctly the cut path for touching situations difficult to separate. Second the preposed algorithm adopts a strategy of producing both the downward and upward paths and selecting better one. The experimental results on actual touching character images showed that the proposed algorithm was superior th conventional algorithms by 3~4% in terms of success ratio of separation.

본 논문은 명암 영상에서 최단 경로 알고리즘을 이용해 인쇄체 접합 문자를 비선형으로 분리하는 방법을 제안한다. 기존의 최단 경로 알고리즘은 특정한 형태의 접합문자를 분할하지 못하는 단점을 가지고 있다. 우리는 기존 알고리즘이 실패하는 상황을 분석하고, 분석 결과를 활용하여 기존 알고리즘이 사용하는 규칙의 문제점을 파악하였다. 그런 후 기존 알고리즘을 두 가지 방향에서 개선하였다. 첫째, 새로운 벌칙항을 추가하여 보다 정교한 경로를 추정하였다. 둘째, 경로 탐색 시 상향 탐색과 하향 탐색을 병행하고 보다 좋은 해를 선택하였다. 실험을 통해 제안하는 알고리즘이 기존 알고리즘에 비해 분할 성공률 면에서 3~4%정도 우수함을 입증하였다.

Keywords

References

  1. S. Marinai, M. Gori, and G. Soda, Artificial neural networks for document analysis and recognition, IEEE Trans. Pattern Analysis and Machine Intelligence, vol.27, no.1, pp.23-35, 2005. https://doi.org/10.1109/TPAMI.2005.4
  2. Casey, G. Richard, and E. Lecolinet, A survey of methods and strategies in character segmentation, IEEE Trans. Pattern Analysis and Machine Intelligence, vol.18, no.7, pp.690-706, 1996. https://doi.org/10.1109/34.506792
  3. G. Vamvakas, B. Gatos, N. Stamatopoulos, and S.J. Perantonis, A complete optical character recognition methodology for historical documents, Proc. 8th International Workshop on Document Analysis Systems, pp.525-532, 2008.
  4. Y. Xia, B.-H. Xiao, C.-H. Wang, and R.-W. Dai, Integrated segmentation and recognition of mixed Chinese/English document, Proc. 9th International Conf. on Document Analysis and Recognition, vol.2, pp.704-708, 2007.
  5. S. Tangwongsan and C. Sumetphong, Optical character recognition techniques for restoration of Thai historical documents, Proc. International Conf. on Computer and Electrical Engineering, pp.531-535, 2008.
  6. H. Fujisawa, A view on the past and future of character and document recognition, Proc. 9th International Conf. on Document Analysis and Recognition, vol.1, pp.3-7, 2007.
  7. L. Peng, C. Liu, X. Ding, and H. Wang, Multilingual document recognition research and its application in China, Proc. 2th International Conf. on Document Image Analysis for Libraries, pp.7-132, 2006.
  8. S.-W. Lee, D.-J. Lee, and H.-S. Park, A new methodology for gray-scale character segmentation and recognition, IEEE Trans. Pattern Analysis and Machine Intelligence, vol.18, no.10, pp.1045-1050, 1996. https://doi.org/10.1109/34.541415
  9. J. Tse, D. Curtis, C. Jones, and E. Yfantis, An OCR-independent character segmentation using shortest-path in grayscale document images, Proc. 6th International Conf. on Machine Learning and Applications, pp.142-147, 2007.
  10. J. Song, Z. Li, M.R. Lyu, and S. Cai, Recognition of merged characters based on forepart prediction, necessity-sufficiency matching, and character-adaptive masking, IEEE Trans. Systems, Man, and Cybernetics, vol.35, no.1, pp.2-11, 2005. https://doi.org/10.1109/TSMCB.2004.837588
  11. J. Wang and J. Jean, Segmentation of merged characters by neural networks and shortest-path, Proc. ACM/SIGAPP symposium on applied computing, pp.762-769, 1993.
  12. J.-H. Bea, K.-C. Jung, J.-W. Kim, and H.-J. Kim, Segmentation of touching characters using an MLP, IEEE Trans. Pattern Recognition Letters, vol.19, pp.701-709, 1998. https://doi.org/10.1016/S0167-8655(98)00048-8
  13. N. Arica and F.T. Yarman-Vural, A new scheme for off-line handwritten connected digit recognition, Proc. 4th International Conference on Pattern Recognition, pp.1127-1129, 1998.
  14. W. Qi, X. Li, and B. Yang, A character segmentation method without character verification, Proc. International Symposium on Intelligent Information Technology Application Workshops, pp.581-584, 2008.
  15. J.-H. Si, F. Yang, and X.-D. Tian, A new algorithm of mixed Chinese-English character segmentation based on irregularty degree, Proc. International Conference on Machine Learning and Cybernetics, pp.2461-2465, 2008.
  16. A. Tonazzini and L. Bedini, Character segmentation in highly blurred ancient printed documents, Proc. International Conference on Image Analysis and Processing, pp.836-841, 1999.
  17. N. Otsu, A threshold selection method from gray-level histograms, Man and Cybernetics, vol.9, no.1, pp.62-66, 1979. https://doi.org/10.1109/TSMC.1979.4310076
  18. R. Neapolitan, K. Naimipour, Foundations of algorithms, Jones and Bartlett Pub, 2011
  19. D.-R. Lee, W.Y. Kim, and I.-S. Oh, Hangul document image retrieval system using rank-based recognition, International Conference on Document Analysis and Recognition, vol.2, pp.615-619, 2005.