Browse > Article
http://dx.doi.org/10.4275/KSLIS.2013.47.1.039

A Keyword Matching for the Retrieval of Low-Quality Hangul Document Images  

Na, In-Seop (전남대학교)
Park, Sang-Cheol (삼성메디슨)
Kim, Soo-Hyung (전남대학교 전자컴퓨터공학부)
Publication Information
Journal of the Korean Society for Library and Information Science / v.47, no.1, 2013 , pp. 39-55 More about this Journal
Abstract
It is a difficult problem to use keyword retrieval for low-quality Korean document images because these include adjacent characters that are connected. In addition, images that are created from various fonts are likely to be distorted during acquisition. In this paper, we propose and test a keyword retrieval system, using a support vector machine (SVM) for the retrieval of low-quality Korean document images. We propose a keyword retrieval method using an SVM to discriminate the similarity between two word images. We demonstrated that the proposed keyword retrieval method is more effective than the accumulated Optical Character Recognition (OCR)-based searching method. Moreover, using the SVM is better than Bayesian decision or artificial neural network for determining the similarity of two images.
Keywords
Low-Quality Korean Document Keyword Retrieval; SVM; OCR; Digital Library;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Chen, F. R., Wilcox, L.D., & Bloomberg, D.S. 1995. "A comparison of discrete and continuous hidden Markov models for phrase spotting in text images." Proc. International Conference on Document Analysis and Recognition, 1: 398-402.
2 DeCurtins, J., & Chen, E. 1995. "Keyword spotting via word shape recognition." Proc. SPIE Document Recognition II, 270-277.
3 Doermann, D. 1998. "The indexing and retrieval of document images." a survey. Computer Vision and Image Understanding, 70(3): 287-298.   DOI   ScienceOn
4 Fausett, L. 1994. Fundamentals of Neural Networks. Prentice Hall.
5 Gose, E., Johnsonbaugh, R., & Jost, S. 1996. Pattern Recognition and Image Analysis. Prentice Hall.
6 Jeong, C.B., Park, S.C., Son, H.J., & Kim, S.H. 2005. "Word Extraction from Table Regions in Document Images for Keyword Spotting." Lecture Notes in Computer Science, 214-223.
7 Jung, M. C., Shin, Y. C., & Srihari, S. N. 1999. "Machine printed character segmentation method using side profiles," in Proc. IEEE Int. Conf. Systems, Man, Cybernetics (SMC), 6: 863-867.
8 Kwag, H. K. 2001. A Study on Word Segmentation and Attribute Extraction from Document Images. Ph.D. dissertation, Chonnam National University, Korea.
9 Kim, Dae Su, 1992. Neural Network Theory and Applications 1. Ha-Tech jeongbo Press.
10 Kim, H. G., Yang, J. H., Lee, J. S., & Oh, I. S. 2001. "Image-based retrieval of printed Korean words using wavelets." Journal of Korea Information Science Society, 28(2): 91-103.   과학기술학회마을
11 Kim, Soo H., Park, S.C., Jeong, C.B., Kim, J.S., Park, H.R., & Lee, G.S. 2005. "Keyword Spotting on Korean Document Images by Matching the Keyword Image." Lecture Notes in Computer Science, 158-166.
12 Lazebnik, S., Schmid, C., & Ponce, J. 2006. "Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories." Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, 2169-2178.
13 Liang, Y., Fairhurst, M.C., & Guest, R.M. 2012. "A synthesised word approach to word retrieval in handwritten documents." Pattern Recognition, 45(12): 4225-4236.   DOI   ScienceOn
14 Lu, Y., & Tan, C. L. 2002. "Word searching in document images using word portion matching." Fifth IAPR International Workshop on Document Analysis Systems, USA, 319-328.
15 Lu, Y., & Tan, C. L. 2004. "Information Retrieval in Document Image Databases." IEEE Transactions on Knowledge and Data Engineering, 16(11): 1398-1410.   DOI   ScienceOn
16 Marukawa, K., Hu, T., Fujisawa, H., & Shima, Y. 1997. "Document retrieval tolerating character recognition errors-evaluation and application." Pattern Recogn, 30: 1361-1371.   DOI   ScienceOn
17 Park, S.C., Son, H.J., Jeong, C.B., & Kim, Soo H. 2006. Character Segmentation and Keyword Matching for the Retrieval of Low Quality Korean Document Images. Ph.D. diss., Chonnam National University. Gwangju. Korea.
18 Mitra, M., & Chaudhuri, B.B. 2000. "Information Retrieval from Documents," A Survey, Information Retrieval, 2: 141-163.   DOI   ScienceOn
19 Ohta, M., Takasu, A., & Adach, J. 1997. "Retrieval methods for English-text width misrecognized OCR characters." Proceedings of 4th International Conference on Document Analysis and Recognition, 2: 950-955.
20 Park, S.C., Son, H.J., Jeong, C.B., & Kim, Soo H. 2005. "Keyword Spotting on Hangul Document Images Using Two-level Image-to-Image Matching." Lecture Notes in Artificial Intelligence, 79-81.
21 Rodriguez-Serrano, Jose A. Perronnin, Florent. 2012. "Synthesizing queries for handwritten word image retrieval." Pattern Recognition, 45(9): 3270-3276.   DOI   ScienceOn
22 Salton, G., Allan, J., Buckley, C., & Singhal, A. 1994. "Automatic Analysis, Theme Generation, and Summarization of Machine-Readable Text." Science, 264: 1421-1426.   DOI   ScienceOn
23 Strathy, N. W., Suen, C. Y., & Krzyzak, A. 1993. "Segmentation of handwritten digits using contour features." Document Analysis and Recognition, Proceedings of the Second International Conference, 577-580.
24 Steinwart, Ingo, & Christmann, Andreas. 2008. Support Vector Machines. New York: Springer-Verlag.
25 Tan, C. L., Huang, W., Yu, Z., & Xu, Y. 2002. "Image document text retrieval without OCR." IEEE Transaction on Pattern Analysis and Machine Intelligence, 24(7): 838-844.   DOI   ScienceOn
26 Yates, R. B., & Neto, B. R. 1999. Modern Information Retrieval. 75-82. ACM press.