Browse > Article
http://dx.doi.org/10.4218/etrij.2018-0553

Deep recurrent neural networks with word embeddings for Urdu named entity recognition  

Khan, Wahab (Department of Computer Science and Software Engineering, International Islamic University)
Daud, Ali (Department of Computer Science and Software Engineering, International Islamic University)
Alotaibi, Fahd (Faculty of Computing and Information Technology, King Abdulaziz University)
Aljohani, Naif (Faculty of Computing and Information Technology, King Abdulaziz University)
Arafat, Sachi (Faculty of Computing and Information Technology, King Abdulaziz University)
Publication Information
ETRI Journal / v.42, no.1, 2020 , pp. 90-100 More about this Journal
Abstract
Named entity recognition (NER) continues to be an important task in natural language processing because it is featured as a subtask and/or subproblem in information extraction and machine translation. In Urdu language processing, it is a very difficult task. This paper proposes various deep recurrent neural network (DRNN) learning models with word embedding. Experimental results demonstrate that they improve upon current state-of-the-art NER approaches for Urdu. The DRRN models evaluated include forward and bidirectional extensions of the long short-term memory and back propagation through time approaches. The proposed models consider both language-dependent features, such as part-of-speech tags, and language-independent features, such as the "context windows" of words. The effectiveness of the DRNN models with word embedding for NER in Urdu is demonstrated using three datasets. The results reveal that the proposed approach significantly outperforms previous conditional random field and artificial neural network approaches. The best f-measure values achieved on the three benchmark datasets using the proposed deep learning approaches are 81.1%, 79.94%, and 63.21%, respectively.
Keywords
conditional random fields; deep recurrent neural network; machine learning; named entity recognition; Urdu;
Citations & Related Records
연도 인용수 순위
  • Reference
1 A. Daud, W. Khan, and D. Che, Urdu language processing: a survey, Artif. Intell. Rev. 47 (2017), no. 3, 1-33.   DOI
2 E.F.T. Kim Sang and F. de Meulder, Introduction to the CoNLL-2003 shared task: language-independent named entity recognition, in Proc. Conf. Nat. Lang. Lear., HLT-NAACL, Edmonton, Canada, 2003, pp. 142-147.
3 U. Singh, V. Goyal, and G.S. Lehal, Named entity recognition system for Urdu, in Proc. COLING, Mumbai, India, 2012, pp. 2507-2518.
4 S. Mukund, R. Srihari, and E. Peterson, An information-extraction system for Urdu--a resource-poor language, ACM Trans. Asian Language Inf. Process. 9 (2010), no. 4, 15:1-43.
5 M.K. Malik, Urdu named entity recognition and classification system using artificial neural network, ACM Trans. Asian Language Inf. Process. 17 (2017), no. 1, 2:1-13.
6 K. Riaz, Rule-based named entity recognition in Urdu, in Proc. Named Entities Workshop, Uppsala, Sweden, July 2010, pp. 126-135.
7 W. Khan et al., A survey on the state-of-the-art machine learning models in the context of NLP, Kuwait J. Sci. 43 (2016), 66-84.
8 D. Becker and K. Riaz, A study in Urdu corpus construction, in Proc. Workshop Asian Language Resoures. Int. Standardization, 2002, pp. 1-5.
9 M.K. Malik and S.M. Sarwar, Urdu named entity recognition and classification system using conditional random field, Sci. Int. 5 (2015), 4473-4477.
10 F. Jahangir et al., N-gram and gazetteer list based named entity recognition for Urdu: A scarce resourced language, in Proc. Workshop Asian Language Resources, Mumbai, India, Dec. 2012, pp. 95-104.
11 S.K. Saha et al., Named entity recognition in Hindi using maximum entropy and transliteration, Polibits 38 (2008), 33-41.   DOI
12 K. Gali et al., Aggregating machine learning and rule based heuristics for named entity recognition, in Proc. IJCNLP NER South South East Asian Languages, Hyderabad, India, Jan. 2008, pp. 25-32.
13 P. Kumar and V.R. Kiran, A hybrid named entity recognition system for South Asian languages, in Proc. IJCNLP NER South South East Asian Languages, Hyderabad, India, Jan. 2008, pp. 83-88.
14 S. Naz et al., Challenges of Urdu named entity recognition: a scarce resourced language, Res. J. Appl. Sci. Eng. Tech. 8 (2014), 1272-1278.   DOI
15 Q. Abbas, Morphologically rich Urdu grammar parsing using Earley algorithm, Nat. Lang. Eng. 22 (2016), 775-810.   DOI
16 A.Z. Syed, Redefining Urdu morphology and grammar for the development of an integrated sentiment analysis framework, PhD dissertation, University of Engineering & Technology, Lahore, 2013.
17 M. Humayoun, H. Hammarstrom, and A. Ranta, Urdu morphology, orthography, and lexicon extraction, in Workshop Computat. Approaches Arabic Script-Based Language, July 2007, pp. 1-8.
18 A. Ekbal et al., Language independent named entity recognition in Indian languages, in Proc. IJCNLP NER South South East Asian Languages, Hyderabad, India, Jan. 2008, pp. 33-40.
19 R. Pascanu et al., How to construct deep recurrent neural networks, arXiv preprint arXiv: 1312.6026, 2013.
20 J. Schmidhuber, Learning complex, extended sequences using the principle of history compression, Neural Computat. 4 (1992), no. 2, 234-242.   DOI
21 V.M. Janakiraman, Explaining aviation safety incidents using deep learned precursors, arXiv preprint arXiv: 1710.04749, 2017.
22 A. Graves and J. Schmidhuber, Framewise phoneme classification with bidirectional LSTM and other neural network architectures, Neural. Netw. 18 (2005), no. 5-6, 602-610.   DOI
23 C. Gulcehre et al., Learned-norm pooling for deep feedforward and recurrent neural networks, in Proc. Joit Euro. Conf. Mach. Learning. Knowledge Discovery Databases., Nancy, France, Sept. 2014, pp. 530-546.
24 W. Khan et al., Urdu named entity dataset for urdu named entity recognition task, in Proc. Sixth Int. Conf. Lang. Tech., 2016, pp. 51-56.
25 S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural. Comput. 9 (1997), 1735-1780.   DOI