Browse > Article
http://dx.doi.org/10.3837/tiis.2015.09.023

An Arabic Script Recognition System  

Alginahi, Yasser M. (IT Research Center for the Holy Quran and its Sciences (NOOR))
Mudassar, Mohammed (Dept. of Computer Science, College of Computer Science and Engineering Taibah University)
Nomani Kabir, Muhammad (Dept. of Computer Systems, University of Pahang)
Publication Information
KSII Transactions on Internet and Information Systems (TIIS) / v.9, no.9, 2015 , pp. 3701-3720 More about this Journal
Abstract
A system for the recognition of machine printed Arabic script is proposed. The Arabic script is shared by three languages i.e., Arabic, Urdu and Farsi. The three languages have a descent amount of vocabulary in common, thus compounding the problems for identification. Therefore, in an ideal scenario not only the script has to be differentiated from other scripts but also the language of the script has to be recognized. The recognition process involves the segregation of Arabic scripted documents from Latin, Han and other scripted documents using horizontal and vertical projection profiles, and the identification of the language. Identification mainly involves extracting connected components, which are subjected to Principle Component Analysis (PCA) transformation for extracting uncorrelated features. Later the traditional K-Nearest Neighbours (KNN) algorithm is used for recognition. Experiments were carried out by varying the number of principal components and connected components to be extracted per document to find a combination of both that would give the optimal accuracy. An accuracy of 100% is achieved for connected components >=18 and Principal components equals to 15. This proposed system would play a vital role in automatic archiving of multilingual documents and the selection of the appropriate Arabic script in multi lingual Optical Character Recognition (OCR) systems.
Keywords
Arabic; Script Recognition; KNN; PCA;
Citations & Related Records
연도 인용수 순위
  • Reference
1 A. M. Namboodiri and A. K. Jain, "Online Handwritten Script Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 26, pp. 124 -130, No. 1, January 2004. Article (CrossRef Link).   DOI
2 S. Lu, and C. L. Tan. "Script and language identification in noisy and degraded document images," IEEE Trans. on Pattern Analysis and Machine Intelligence, 30.1, pp. 14-24, 2008. Article (CrossRef Link).   DOI
3 J. Hochberg, K. Bowers, M. Cannon, and P. Kelly, "Script and Language Identification for Handwritten Document Images," Int'l J. Document Analysis & Recognition, Vol. 2, pp. 45-52, No. 2/3, Dec. 1999. Article (CrossRef Link).   DOI
4 G. S. Peake and T. N, Tan, "Script and Language Identification from Document Images," in Proc. of Lecture Notes in Computer Science, Asian Conference on Computer Vision, Hong Kong, LNCS-1352, pp. 97-104, Jan 1998.
5 M. Benjelil, R. Mullot and A. M. Alimi, "Language and script identification based on Steerable Pyramid Features," in Proc. of Frontiers in Handwriting Recognition, pp. 716-721, 2012. Article (CrossRef Link).
6 T. N. Tan, "Rotation invariant texture features and their use in automatic script identification," IEEE Transactions on Pattern Analysis and Machine Intelligence, 20.7, 751-756, 1998. Article (CrossRef Link).   DOI
7 Lu, Shijian, and Chew Lim Tan. "Automatic detection of document script and orientation," in Proc. of Ninth International Conference on Document Analysis and Recognition, ICDAR 2007. Vol. 1. IEEE, 2007. Article (CrossRef Link).
8 I. T. Jolliffe, Principal Component Analysis, Springer, New York, NY, 1986. Article (CrossRef Link).
9 Q. P. He and J. Wang. "Fault detection using the k-nearest neighbor rule for semiconductor manufacturing processes," IEEE Transactions on Semiconductor Manufacturing, Vol.20, No. 4, pp. 345-354, 2007. Article (CrossRef Link).   DOI
10 A. Busch, W. W. Boles, and S. Sridharan, "Texture for Script Identification," IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 27, pp. 1720 – 1732, No. 11, Nov. 2005. Article (CrossRef Link).   DOI
11 Chiang, T. H., Lo, H. Y., & Lin, S. D. A Ranking-based KNN Approach for Multi-Label Classification. In ACML. 2012, pp. 81-96.
12 M. Galar, et al. "An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes," Pattern Recognition, Vol. 44, No. 8, pp. 1761-1776, 2011. Article (CrossRef Link).   DOI
13 N. V. Chawla, N. Japkowicz, and A. Kotcz. "Editorial: special issue on learning from imbalanced data sets," ACM Sigkdd Explorations Newsletter, Vol. 6, No. 1, pp. 1-6, 2004. Article (CrossRef Link).   DOI
14 R. Rifkin, A. Klautau, “In defense of one-vs-all classification,” Journal of Machine Learning Research, Vol. 5, 101–141, 2004.
15 S. Abirami & D. Manjula, "A Survey of Script Identification Techniques for Multi-Script Document Images," International Journal of Recent Trends in Engineering," Vol. 1, No. 2, May 2000.
16 Studypersion.com, http://www.studypersian.com/starter/alefba.htm, viewed 15-12-2013.
17 Arabic Alphabet. Encyclopaedia Britannica online.http://www.britannica.com/eb/article-9008156/Arabic-alphabet. Retrieved 23-11-2013.
18 Y. M. Alginahi, "A Survey on Arabic Character Segmentation," International Journal on Document Analysis and Recognition (IJDAR), Vol.16, pp. 105 – 126, No.2 2013. Article (CrossRef Link).   DOI
19 Wikipedia.org, http://en.wikipedia.org/wiki/Urdu. Retrieved 17-12-2013.
20 U. Pal, S. Sinha and B. Chaudhuri, "Multi-Script Line Identification from Indian Documents," in Proc. of International Conference on Document analysis and Recognition, Edinburgh, pp. 880-884, Aug. 2003. Article (CrossRef Link).
21 D. Ghosh, T. Dube, and A. P. Shivaprasad, "Script Recognition – A Review," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, pp. 2142 –2161, No. 12, 2010. Article (CrossRef Link).   DOI
22 A. L. Spitz, "Determination Of The Script And Language Content Of Document Images," IEEE Tran. On Pattern Analysis and Machine Intelligence, Vol. 19, pp.234-245, No. 3 1997. Article (CrossRef Link).   DOI
23 S. Kanoun, A. Ennaji, Y. Lecourtier, and A.M. Alimi, "Script and Nature Differentiation for Arabic and Latin Text Images," in Proc. of the International Workshop Frontiers in Handwriting Recognition, Niagra, pp. 309-313, Aug. 2002. Article (CrossRef Link).
24 J. Hochberg, P. Kelly, T. Thomas, and L. Kerns, "Automatic Script Identification from Document Images Using Cluster-based Templates," IEEE Trans. Pattern Analysis & Machine Intelligence, Vol. 19, pp. 176-181, No. 2, Feb. 1997. Article (CrossRef Link).   DOI
25 S.L. Wood, X. Yao, K. Krishnamurthi, and L. Dang, "Language Identification for Printed Text Independent of Segmentation," in Proc. of Int'l Conf. Image Processing, Washington D.C., Vol. 3, pp. 428431, 1995. Article (CrossRef Link).