Browse > Article
http://dx.doi.org/10.4218/etrij.2021-0269

Comparative study of text representation and learning for Persian named entity recognition  

Pour, Mohammad Mahdi Abdollah (Computer Engineering Department, Amirkabir University of Technology)
Momtazi, Saeedeh (Computer Engineering Department, Amirkabir University of Technology)
Publication Information
ETRI Journal / v.44, no.5, 2022 , pp. 794-804 More about this Journal
Abstract
Transformer models have had a great impact on natural language processing (NLP) in recent years by realizing outstanding and efficient contextualized language models. Recent studies have used transformer-based language models for various NLP tasks, including Persian named entity recognition (NER). However, in complex tasks, for example, NER, it is difficult to determine which contextualized embedding will produce the best representation for the tasks. Considering the lack of comparative studies to investigate the use of different contextualized pretrained models with sequence modeling classifiers, we conducted a comparative study about using different classifiers and embedding models. In this paper, we use different transformer-based language models tuned with different classifiers, and we evaluate these models on the Persian NER task. We perform a comparative analysis to assess the impact of text representation and text classification methods on Persian NER performance. We train and evaluate the models on three different Persian NER datasets, that is, MoNa, Peyma, and Arman. Experimental results demonstrate that XLM-R with a linear layer and conditional random field (CRF) layer exhibited the best performance. This model achieved phrase-based F-measures of 70.04, 86.37, and 79.25 and word-based F scores of 78, 84.02, and 89.73 on the MoNa, Peyma, and Arman datasets, respectively. These results represent state-of-the-art performance on the Persian NER task.
Keywords
contextualized representation; NER; Persian language processing;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 M. Farahani, M. Gharachorloo, M. Farahani, and M. Manthouri, Parsbert: Transformer-based model for Persian language understanding, arXiv preprint, 2020. https://doi.org/10.48550/arXiv.2005.12515   DOI
2 P. S. Mortazavi and M. Shamsfard, Named entity recognition in Persian texts, JSDP 16 (2020), no. 4, 1-10. https://doi.org/10.29252/jsdp.16.4.93   DOI
3 M. Abdoos, and B. B. Minaei, Improving named entity recognition using Izafe in Farsi, Signal Data Process. 14 (2018), no. 4, 43-54.   DOI
4 S. Hosseinnejad, Y. Shekofteh, and T. Emami Azadi, A'laam corpus: A standard corpus of named entity for Persian language, Signal Data Process. 14 (2017), no. 3, 127-142.   DOI
5 O. Moradiannasab, S. Momtazi, and A. Palmer, A named entity recognition tool for Persian, (Proceedings of the 3rd Conference on Computational Linguistics, Tehran, Iran), 2014.
6 K. Dashtipour, M. Gogate, A. Adeel, A. Algarafi, N. Howard, and A. Hussain, Persian named entity recognition, (IEEE 16th International Conference on Cognitive Informatics & Cognitive Computing, Oxford, UK), 2017, pp. 79-83. https://doi.org/10.1109/ICCI-CC.2017.8109733   DOI
7 A. Thomas, and S. Sangeetha, An innovative hybrid approach for extracting named entities from unstructured text data, Comput. Intell. 35 (2019), no. 4, 799-826. https://onlinelibrary.wiley.com/doi/abs/10.1111/coin.12214   DOI
8 S. Kwon, Y. Ko, and J. Seo, Effective vector representation for the Korean named-entity recognition, Pattern Recogn. Lett. 117 (2019), 52-57.   DOI
9 Y. Bengio, A. Courville, and P. Vincent, Representation learning: A review and new perspectives, IEEE Trans. Pattern Anal. Machine Intell. 35 (2013), no. 8, 1798-1828.   DOI
10 T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, Distributed representations of words and phrases and their compositionality, (Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA), Dec. 2013, pp. 3111-3119.
11 Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, and K. J. Google's, Google's neural machine translation system: Bridging the gap between human and machine translation, arXiv preprint, 2016. https://doi.org/10.48550/arXiv.1609.08144   DOI
12 A. Kumar, S. Verma, and A. Sharan, ATE-SPD: simultaneous extraction of aspect-term and aspect sentiment polarity using BiLSTM-CRF neural network, J. Experimental Theoret. Artif. Intell. 33 (2021), no. 3, 487-508. https://doi.org/10.1080/0952813X.2020.1764632   DOI
13 Y. Chen, T. A. Lasko, Q. Mei, J. C. Denny, and H. Xu, A study of active learning methods for named entity recognition in clinical text, J. Biomed. Inform. 58 (2015), 11-18.   DOI
14 Z. Miftahutdinov, I. Alimova, and E. Tutubalina, On biomedical named entity recognition: Experiments in interlingual transfer for clinical and social media texts, Advances in information retrieval, J. M. Jose, E. Yilmaz, J. Magalhaes, P. Castells, N. Ferro, M. J. Silva, and F. Martins, (eds.), Springer International Publishing, 2020, pp. 281-288.
15 H. Poostchi, E. Z. Borzeshi, M. Abdous, and M. Piccardi, Personer: Persian named-entity recognition, (Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, Osaka, Japan), 2016, pp. 3381-3389.
16 M. H. Bokaei and M. Mahmoudi, Improved deep Persian named entity recognition, (9th International Symposium on Telecommunications, Tehran, Iran), 2018, pp. 381-386. https://doi.org/10.1109/ISTEL.2018.8661067   DOI
17 A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzman, E. Grave, M. Ott, L. Zettlemoyer, and V. Stoyanov, Unsupervised cross-lingual representation learning at scale, (Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics), 2020, pp. 8440-8451. https://doi.org/10.18653/v1/2020.acl-main.747   DOI
18 M. S. Shahshahani, M. Mohseni, A. Shakery, and H. Faili, Payma: A tagged corpus of Persian named entities, Signal Data Process. 16 (2019), no. 1, 91-110.
19 S. Momtazi, and F. Torabi, Named entity recognition in Persian text using deep learning, Signal Data Process. 16 (2020), no. 4, 93-112.   DOI
20 E. Taher, S. A. Hoseini, and M. Shamsfard, Beheshti-NER: Persian named entity recognition using BERT, arXiv preprint, 2020. https://doi.org/10.48550/arXiv.2003.08875   DOI
21 M. Carbonell, A. Fornes, M. Villegas, and J. Llados, A neural model for text localization, transcription and named entity recognition in full pages, Pattern Recognition Lett. 136 (2020), 219-227.   DOI
22 F. Jalali Farahani and G. Ghassem-Sani, Persian named entity recognition, Master's Thesis, Sharif University of Technology, 2020.
23 M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, Deep contextualized word representations, (Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA), 2018, pp. 2227-2237.
24 J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, Bert: Pretraining of deep bidirectional transformers for language understanding, (Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA), 2019, pp. 4171-4186.
25 A. Graves, S. Fernandez, and J. Schmidhuber, Bidirectional LSTM networks for improved phoneme classification and recognition, (International Conference on Artificial Neural Networks: Formal Models and Their Applications, Warsaw, Poland), 2005, pp. 799-804.
26 H. Moradi, F. Ahmadi, and M.-R. Feizi-Derakhshi, A hybrid approach for Persian named entity recognition, Iranian J. Sci. Technol. Trans. A: Sci. 41 (2017), no. 1, 215-222.   DOI
27 M. K. Khormuji and M. Bazrafkan, Persian named entity recognition based with local filters, Int. J. Comput. Appl. 100 (2014), no. 4, 1-6.
28 M. Bijankhan, J. Sheykhzadegan, M. Bahrani, and M. Ghayoomi, Lessons from building a Persian written corpus: Peykare, Lang. Resour. Eval. 45 (2011), no. 2, 143-164.   DOI
29 L. Hafezi and M. Rezaeian, Neural architecture for Persian named entity recognition, (Iranian Conference on Signal Processing and Intelligent Systems, Tehran, Iran), 2018, pp. 61-64. https://doi.org/10.1109/ICSPIS.2018.8700549   DOI
30 Z. Meng, S. Tian, L. Yu, and Y. Lv, Joint extraction of entities and relations based on character graph convolutional network and multi-head self-attention mechanism, J. Experimental Theoret. Artif. Intell. 33 (2021), no. 2, 349-362. https://doi.org/10.1080/0952813X.2020.1744198   DOI
31 D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, (3rd International Conference on Learning Representations, San Diego, CA, USA), May 2015.
32 W. Ling, C. Dyer, A. W. Black, I. Trancoso, R. Fermandez, S. Amir, L. Marujo, and T. Luis, Finding function in form: Compositional character models for open vocabulary word representation, (Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal), 2015, pp. 1520-1530. https://doi.org/10.18653/v1/D15-1176   DOI
33 A. Hadifar, and S. Momtazi, The impact of corpus domain on word representation: A study on Persian word embeddings, Lang. Resources Eval. 52 (2018), no. 4, 997-1019.   DOI