Classification of Handwritten and Machine-printed Korean Address Image based on Connected Component Analysis

연결요소 분석에 기반한 인쇄체 한글 주소와 필기체 한글 주소의 구분

  • 장승익 (한국전자통신연구원 우정기술연구센터) ;
  • 정선화 (한국전자통신연구원 우정기술연구센터) ;
  • 임길택 (한국전자통신연구원 우정기술연구센터) ;
  • 남윤석 (한국전자통신연구원 우정기술연구센터)
  • Published : 2003.10.01

Abstract

In this paper, we propose an effective method for the distinction between machine-printed and handwritten Korean address images. It is important to know whether an input image is handwritten or machine-printed, because methods for handwritten image are quite different from those of machine-printed image in such applications as address reading, form processing, FAX routing, and so on. Our method consists of three blocks: valid connected components grouping, feature extraction, and classification. Features related to width and position of groups of valid connected components are used for the classification based on a neural network. The experiment done with live Korean address images has demonstrated the superiority of the proposed method. The correct classification rate for 3,147 testing images was about 98.85%.

본 논문에서는 우편봉투 상에 기입된 인쇄체 한글 주소와 필기체 한글 주소를 효과적으로 구분할 수 있는 방법을 제안한다. 문자인식 모듈을 포함하는 각종 응용 시스템에서 입력 영상이 인쇄체인지 필기체인지 구분하는 것은 매우 중요하다. 이는 대부분의 경우 인쇄체 영상과 필기체 영상이 갖는 특징이 상이하여, 각 영상에서의 문자 및 문자열 분리 방법, 문자 인식 방법 둥이 매우 상이하게 개발되기 때문이다. 본 논문에서 제안한 구분 방법은 연결요소 추출 및 병합, 특징 추출, 영상 구분 순으로 수행된다. 연결요소 추출 및 병합 단계에서는 입력영상으로부터 연결요소를 추출한 후 일부 연결요소들에 대하여 병합을 시도하며, 특징 추출 단계에서는 병합결과 얻어진 연결요소들의 그룹들로부터 폭과 위치에 관련된 특징을 추출하고, 영상 구분 단계에서는 추출한 특징을 입력으로 제공받는 다충퍼셉트론을 사용하여 구분을 시도한다. 제안한 방법의 우수성을 증명하기 위해 실제 우편물로부터 추출된 3,147개의 한글 주소 영상을 사용하여 실험한 결과, 98.85%의 구분률을 보여주었다.

Keywords

References

  1. '순로구분 자동처리 시스템 개발' - 최종 연구개발보고서, 정보통신부, 2001
  2. U. Mahadevan and S.N. Srihari, 'Parsing and Recognition of City, State, and ZIP Codes in Handwritten Addresses,' Proceedings of 5th International Conference on Document Analysis and Recognition, pp. 325-328, Bangalore, India, 1999 https://doi.org/10.1109/ICDAR.1999.791790
  3. G. Dzuba, A. Filatov and A. Volgunin, 'Handwritten ZIP Code Recognition,' Proceedings of 4th International Conference on Document Analysis and Recognition, pp. 766-770, Ulm, Germany, 1997 https://doi.org/10.1109/ICDAR.1997.620613
  4. A. Brakensiek, J. Rottland, G. Rigall, 'Handwritten Address Recognition with Open Vocabulary Using Character N-grams,' Proceedings of 8th International Workshop on Frontiers in Handwriting Recognition, pp. 357-362, Niagara-on-the-Lake, Canada, 2002 https://doi.org/10.1109/IWFHR.2002.1030936
  5. N. Kata, K. Todumoto and Y. Nemoto, 'A Large Scale Japanese Handwritten Address Recognition System Using Rough and Fine Classification,' Proceedings of 5th International Conference on Signal Processing, Vol. 3, pp. 1423-1426, 2000 https://doi.org/10.1109/ICOSP.2000.893368
  6. F. Kimura and M. Shridhar, 'Handwritten Address Interpretation Using Extended Lexicon Word Matching,' Proceedings of 5th International Workshop on Frontiers in Handwriting Recognition, pp. 369-372, Essex, England, 1996
  7. K. Fan, L. Wang and Y. Tu, 'Classification of Machine-Printed and Hand-Written Texts Using Character Block Layout Variance,' Pattern Recognition, Vol. 31, No. 9, pp. 1275-1284, 1998 https://doi.org/10.1016/S0031-3203(97)00143-X
  8. U. Pal and B. Chaudhuri, 'Automatic Separation of Machine-Printed and Hand-Written Text Lines,' Proceedings of 5th International Conference on Document Analysis and Recognition, pp. 645-648, Bangalore, India, 1999 https://doi.org/10.1109/ICDAR.1999.791870
  9. K. Kuhnke, L. Simonicini and Z. Kovacs-V, 'A System for Machine-Written and Hand-Written Character Distinction,' Proceedings of 3rd International Conference on Document Analysis and Recognition, pp. 811-814, 1995 https://doi.org/10.1109/ICDAR.1995.602025
  10. S. Image, S. Tatsuta and T. Wada, 'Segmentation and Classification for Mixed Text/Image Documents Using Neural Network,' Proceedings of 2nd International Conference on Document Analysis and Recognition, pp. 930-934, 1993 https://doi.org/10.1109/ICDAR.1993.395584
  11. J. Franke and M. Oberlandwr, 'Writing Style Detection by Statistical Combination of Classifiers in Form Reader Applications,' Proceedings of 2nd International Conference on Document Analysis and Recognition, pp. 581-584, 1993 https://doi.org/10.1109/ICDAR.1993.395668
  12. S. Tsujimoto and H. Asada, 'Major Components of a Complete Text Reading System,' Proceeding IEEE, Vol. 80, No. 7, pp. 1133-1149, 1992 https://doi.org/10.1109/5.156475
  13. S. Jeong, S. Kim and W. Cho, 'Performance Comparison of Statistical and Neural Network Classifiers in Handwritten Digits Recognition,' Proceedings of 6th International Workshop on Frontiers in Handwriting Recognition, pp. 419-428, Taejon, Korea, 1998
  14. N. Otsu, 'A threshold selection method from gray-level histogram,' IEEE Transactions on Systems, Man, and Cybernetics, Vol. 9, pp. 62-66, 1979 https://doi.org/10.1109/TSMC.1979.4310076
  15. Proceedings of 6th International Workshop on Frontiers in Handwriting Recognition Performance Comparison of Statistical and Neural Network Classifiers in Handwritten Digits Recognition S.Jeong;S.Kim;W.Cho
  16. IEEE Transactions on Systems, Man, and Cybernetics v.9 A threshold selection method from gray-level histogram N.Otsu