텍스트와 그래픽으로 구성된 혼합문서 인식에 관한 연구

A Study on the Recognition of Mixed Documents Consisting of Texts and Graphic Images

  • 함영국 (서강대학교 전자공학과) ;
  • 김인권 (서강대학교 전자공학과) ;
  • 정홍규 (서강대학교 전자공학과) ;
  • 박래홍 (서강대학교 전자공학과) ;
  • 이창범 (한국전자통신연구소 통신관리연구부) ;
  • 김상중 (한국전자통신연구소 통신관리연구부) ;
  • 윤병남 (한국전자통신연구소 통신관리연구부)
  • 발행 : 1994.07.01

초록

In this paper, an efficient algorithm is proposed which recognizes the mixed document consisting of the printed Korean/alphanumeric texts and graphic images. In the preprocessing step an input document is aligned if necessary by rotating it. We obtain the rotation angle using the Hough transform and align the input document horizontally. Then we separate graphic image parts from text parts by considering chain codes of connected components. We further separate each character using vertical and horizontal projections. In the recognition step Korean and alphanumeric characters are classified and each of them is recognized hierarchically using several features. In summary an efficient recognition algorithm for mixed documents is proposed and its performance is demonstrated via computer simulations.

키워드