A postprocessing method for korean optical character recognition using eojeol information

어절 정보를 이용한 한국어 문자 인식 후처리 기법

  • 이영화 (경북대학교 컴퓨터공학과) ;
  • 김규성 (경북대학교 컴퓨터공학과) ;
  • 김영훈 (안동 전문대 전자계산과) ;
  • 이상조 (경북대학교 컴퓨터공학과)
  • Published : 1998.02.01

Abstract

In this paper, we will to check and to correct mis-recognized word using Eojeol information. First, we divided into 16 classes that constituents in a Eojeol after we analyzed Korean statement into Eojeol units. Eojeol-Constituent state diagram constructed these constitutents, find the Left-Right Connectivity Information. As analogized the speech of connectivity information, reduced the number of cadidate words and restricted case of morphological analysis for mis-recognition Eojeol. Then, we improved correction speed uisng heuristic information as the adjacency information for Eojeol each other. In the correction phase, construct Reverse-Order Word Dictionary. Using this, we can trace word dictionary regardless of mis-recongnition word position. Its results show that improvement of recognition rate from 97.03% to 98.02% and check rate, reduction of chadidata words and morpholgical analysis cases.

Keywords