Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2010.17B.2.125

Research and Development of Document Recognition System for Utilizing Image Data  

Kwag, Hee-Kue ((주)인지소프트)
Abstract
The purpose of this research is to enhance document recognition system which is essential for developing full-text retrieval system of the document image data stored in the digital library of a public institution. To achieve this purpose, the main tasks of this research are: 1) analyzing the document image data and then developing its image preprocessing technology and document structure analysis one, 2) building its specialized knowledge base consisting of document layout and property, character model and word dictionary, respectively. In addition, developing the management tool of this knowledge base, the document recognition system is able to handle the various types of the document image data. Currently, we developed the prototype system of document recognition which is combined with the specialized knowledge base and the library of document structure analysis, respectively, adapted for the document image data housed in National Archives of Korea. With the results of this research, we plan to build up the test-bed and estimate the performance of document recognition system to maximize the utilization of full-text retrieval system.
Keywords
Full Text Retrieval; Document Recognition; Document Structure Analysis; Knowledge Base; Image Layout & Property; Character Model; Word Dictionary;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 이준호, 이충식, 한선화, 김진형, “문자 인식에 의해 구축된 한글 문서 데이터베이스에 대한 정보 검색”, 한국정보처리논문지, 제6권, 제4호, pp.833-840, 1999.   과학기술학회마을
2 정규식, 권희웅, “내용기반의 인쇄체 영문 문서 영상 검색을 위한 특징기반 단어 검색”, 한국정보과학논문지, 제26권, 제10호, pp.1204-1218, 1999.   과학기술학회마을
3 오일석, 김수형, 유태웅, 곽희규, "문서 영상 처리 기술과 디지털 도서관", 한국정보과학회지, 제20권, 제8호, pp.24-34, 2002.   과학기술학회마을
4 E. A. Galloway and G. V. Michalek, "The Heinz Electronic Library Interactive Online System(HELIOS): Building a digital archive using imaging, OCR, and natural language processing technologies," The Public-Access Computer Systems Review, Vol.6, No.4, pp.6-18, 1995.
5 M. Droettboom, I. Fujinaga, K. MacMilan, G. S. Chouhury, T. DiLauro, M. Patton and T. Anderson, "Using the Gamera framework for the recognition of cultural heritage materials,” Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital Libraries, pp.11-17, 2002.   DOI
6 S. Hara, “OCR for CJK classical texts preliminary examination,” Proc. Pacific Neighborhood Consortium(PNC) Annual Meeting, Taipei, Taiwan, pp.11-17, 2000.
7 김두식, 김상엽, 이성환, “한글문서 분석 및 인식기술의 최근 연구동향”, 전자공학회지, 제24권, 제9호, pp.1058-1070, 1997.   과학기술학회마을
8 J. Beusekom, D. Keysers, F. Shafait, T. M. Breuel, “Example-Based Logical labeling of Document Title Page Images,” 2007, Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR 2007), pp.919-923.   DOI
9 F. Shafait, J. Beusekom, D. Keysers, T. M. Breuel, “Structural Mixtures for Statistical layout Analysis,” 2008, Proc. 8th Int. Workshop on Document Analysis Systems (DAS) Accepted for publication.   DOI
10 K. Marukawa, T. Hu, H. Fujisawa and Y. Shima, "Document retrieval tolerating character recognition errors-evaluation and application," Pattern Recognition, Vol.30, No.8, pp.1361-1371, 1997.   DOI   ScienceOn
11 M. Kojima, Y. Kawazoe and M. Kimura, “Automatic Tibetan Script Recognition by Computer,” Proceeding of the 7th Seminar of the International Association for Tibetan Studies, Graz, 1995, edited by Ernst Steinkellner, Vol.1, pp.527-533, 1997.
12 D. Doermann, "The indexing and retrieval of document images: A survey," Computer Vision and Image Understanding, Vol.70, No.3, pp.287-298, 1998.   DOI   ScienceOn
13 Digital Heritage Publishing Ltd., "The electronic version of Siku Quanshu," http://www.skqs.com.
14 T. Keaton, H. Greenspan and R. Goodman, "Keyword spotting for cursive document retrieval," Proceedings of the workshop on Document Image Analysis, pp.74-81, 1997.   DOI
15 M. S. Kim, S. Ryu, K. T. Cho, T. H. Rhee, H. I. Choi, J. H. Kim, "Recognition-based Digitalization of Korean Historical Archives," Asian Information Retrieval Symposium(AIRS2004), Beijing, China, pp.186-189, 2004.
16 T. Shih, "Transformation of palace archives of Ming and Ching Dynasties onto CD-ROM and Internet," Proc. Pacific Neighborhood Consortium(PNC) Annual Meeting, Taipei, Taiwan, 2000.
17 Minsoo Kim, Kyutae Cho, Heegue Kwag, Jin Hyung Kim, "Segmentation Method of Handwritten Characters for Digitalizing Korean Historical Documents," The 6th international Conference on Document Analysis Systems, Florence, pp.114-124, 2004.