[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5392/IJoC.2017.13.3.038

Language Identification in Handwritten Words Using a Convolutional Neural Network

Tung, Trieu Son (Dept. of ECE, Chonnam National University)
Lee, Gueesang (Dept. of ECE, Chonnam National University)

Publication Information

International Journal of Contents / v.13, no.3, 2017 , pp. 38-42 More about this Journal

Abstract

Documents of the last few decades typically include more than one kind of language, so linguistic classification of each word is essential, especially in terms of English and Korean in handwritten documents. Traditional methods mostly use conventional features of structural or stroke features, but sometimes they fail to identify many characteristics of words because of complexity introduced by handwriting. Therefore, traditional methods lead to a considerably more-complicated task and naturally lead to possibly poor results. In this study, convolutional neural network (CNN) is used for classification of English and Korean handwritten words in text documents. Experimental results reveal that the proposed method works effectively compared to previous methods.

Keywords

Conventional Neural Network; Korean Text; English Text; Handwritten Document; Classification; Document Analysis;

Citations & Related Records

Reference

1	Convolutional Neural Networks (LeNet), "DeepLearning 0.1 documentation," DeepLearning 0.1. LISA Lab, Retrieved 31 Aug. 2013.
2	M. Matsugu, K. Mori, Y. Mitari, and Y. Kaneda, "Subject independent facial expression recognition with robust face detection using a convolutional neural network," Neural Networks, vol. 16, no. 5, 2003, pp. 555-559. DOI
3	Y. Lei, L. Ferrer, A. Lawson, M. McLaren, and N. Scheffer, "Application of Convolutional Neural Networks to Language Identification in Noisy Conditions," Speech Technology and Research Laboratory, SRI International.
4	N. Otsu, "A threshold selection method from gray-level histogram," IEEE Trans. Syst. Man Cybern., vol. 9, issue. 1, 1979, pp. 62-66. DOI
5	J. Bernsen, "Dynamic thresholding of gray level images," ICPR, 1986, pp. 1251-1255.
6	G. Joharmsen and J. Bille, "A threshold selection method using information measures," ICPR, 1982, pp. 140-143.
7	N. J. Kapur, P K. Sahoo, and C. K. A. Wong, "A new method for gray-level picture thresholding using the entropy of the histogram," JCVPIP, vol. 29, issue. 3, 1985, pp. 273-285.
8	J. Sauvola and M. Pietikainen, "Adaptive document image binarization," Pattern Recognition, vol. 33, issue. 2, 2000, pp. 225-236. DOI
9	W. Niblack, An introduction to digital image processing, Prentice Hall, Eaglewood Cliffs, 1986, pp. 115-116.
10	M. Zissman, "Automatic language identification using gaussian mixture and hidden markov models," International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2, Apr. 1993, pp. 399-402.
11	K. P. Li, "Automatic language identification using syllabic spectral features," International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, Apr. 1994, pp. I/297-I/300.
12	Y. Zheng, H. Li, and D. Doermann, "Machine Printed Text and Handwriting Identification in Noisy Document Image," ICDAR, Sep. 2003.
13	R. C. F. Tucker, M. Carey, and E. Parris, "Automatic language identification using sub-word models," International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, Apr. 1994, pp. I/301-I/30.
14	W B. Cavnar and J. M. Trenkle. "N-gram based text categorization," Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval, 1994, pp. 161-169.
15	M. J. Martino and R. C. Paulsen, "Natural language determination using partial words," U.S. Patent No. 6216102 B1, 2001.
16	Y. Zheng, C. Liu, and X. Ding, "Single Character Type Identification," Proc. SPIE Conf. Document Recognition and Retrieval, 2002, pp. 49-56.
17	K. Kuhnke, L. Simoncini, and Zs. M. Kov'acs-V, "A system for machine-written and hand-written character distinction," IEEE Proc. ICDAR, vol. 2, 1995, pp. 811-814.
18	L. F. Silva and A. Sanchez, "Automatic Discrimination between Printed and Handwritten Text in Documents," ICDAR, 1999.
19	K. Zagoris, I. Pratikahs, A. Antonacopoulos, B. Gatos, and N. Papamarkos, "Distinction between handwritten and machine-printed text based on the bag of visual words model," PR, 2013.
20	J. Hochberg, P. Kelly, T. Thomas, and L. Kerns, "Automatic Script Identification from Document Images Using Cluster-Based Templates," IEEE Transactions on PAMI, vol. 19, no. 2. Feb. 1997.
21	A. Lawrence Spitz, "Determination of the Script and Language Content of Document Images," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 3, Mar. 1997.