Browse > Article
http://dx.doi.org/10.22937/IJCSNS.2021.21.11.28

Developing and Pre-Processing a Dataset using a Rhetorical Relation to Build a Question-Answering System based on an Unsupervised Learning Approach  

Dutta, Ashit Kumar (Department of Computer Science and Information Systems, College of Applied Sciences, AlMaarefa University)
Wahab sait, Abdul Rahaman (Center of Documents and Archive, King Faisal University)
Keshta, Ismail Mohamed (Department of Computer Science and Information Systems, College of Applied Sciences, AlMaarefa University)
Elhalles, Abheer (Department of Computer Science and Information Systems, College of Applied Sciences, AlMaarefa University)
Publication Information
International Journal of Computer Science & Network Security / v.21, no.11, 2021 , pp. 199-206 More about this Journal
Abstract
Rhetorical relations between two text fragments are essential information and support natural language processing applications such as Question - Answering (QA) system and automatic text summarization to produce an effective outcome. Question - Answering (QA) system facilitates users to retrieve a meaningful response. There is a demand for rhetorical relation based datasets to develop such a system to interpret and respond to user requests. There are a limited number of datasets for developing an Arabic QA system. Thus, there is a lack of an effective QA system in the Arabic language. Recent research works reveal that unsupervised learning can support the QA system to reply to users queries. In this study, researchers intend to develop a rhetorical relation based dataset for implementing unsupervised learning applications. A web crawler is developed to crawl Arabic content from the web. A discourse-annotated corpus is generated using the rhetorical structural theory. A Naïve Bayes based QA system is developed to evaluate the performance of datasets. The outcome shows that the performance of the QA system is improved with proposed dataset and able to answer user queries with an appropriate response. In addition, the results on fine-grained and coarse-grained relations reveal that the dataset is highly reliable.
Keywords
Arabic dataset; rhetorical relation; discourse relation; rhetorical structure theory; Question-Answering system; natural language processing;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Karaoui Jihen, Zitoune Benamara Farah and Moriceau Veronique,2017. SOUKHRIYA: Towards an Irony Detection System for Arabic in Social Media. 3rd International conference on Arabic computational Linguistics, ACling 2017. Dubai, Unitd Arab Emirates.
2 Luqman Hamzah and Mahmoud Sabri, 2018. Automatic Translation of Arabic text-to Arabic sign language. Universal access in the information society.
3 D. Jurafsky and J. H. Martin, Speech & Language Processing. London, U.K.: Pearson, 2017.
4 Lee, H.Y. and Renganathan, H. (2011) 'Chinese sentiment analysis using maximum entropy', in Proceedings of the Workshop on Sentiment Analysis where AI meets Psychology (SAAIP 2011), pp.89-93.
5 K. C. Ryding, A Reference Grammar of Modern Standard Arabic. Cambridge, U.K.: Cambridge Univ. Press, 2005.
6 Liu, Y., Li, S., Zhang, X. and Sui, Z. (2016) 'Implicit discourse relation classification via multi-task neural networks', Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI'16), pp.2750-2756.
7 S. K. Ray and K. Shaalan, ''A review and future perspectives of Arabic question answering systems,'' IEEE Trans. Knowl. Data Eng., vol. 28, no. 12, pp. 3169-3190, Dec. 2016.   DOI
8 A. Mishra and S. K. Jain, ''A survey on question answering systems with classification,'' J. King Saud Univ.-Comput. Inf. Sci., vol. 28, no. 3, pp. 345-361, Jul. 2016.
9 Y. H. Phuong and L. G. T. Nguyen, ''English teachers'questions in a vietnamese high school reading classroom,'' JEELS (J. English Educ. Linguistics Stud.), vol. 4, no. 2, pp. 129-154, 2018.   DOI
10 Louis, A., Joshi, A. and Nenkova, A. (2010a) 'Discourse indicators for content selection in summarization', in Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Association for Computational Linguistics, pp.147-156.
11 F. Aouladomar, ''Towards answering procedural questions,'' in Proc. IJCAI Workshop Knowl. Reasoning Answering Questions, 2005, pp. 1-11.
12 Walaa Saber Ismail and Masun Nabhan Homsi," DAWQAS: A dataset for Arabic Why Question Answering system, Procedia computer science, vol.142, pp. 123 -131, 2018.   DOI
13 A. Farghaly and K. Shaalan, ''Arabic natural language processing: Challenges and solutions,'' ACM Trans. Asian Lang. Inf. Process., vol. 8, no. 4, pp. 1-22, 2009   DOI
14 Lagrini, S., Redjimi, M. and Azizi, N. (2017) 'Automatic Arabic text summarization approaches', International Journal of Computer Applications, Vol. 164, No. 5, pp.31-37.   DOI
15 K. Jarvelin and J. Kekalainen. Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf. Syst., 20(4):422-446, 2002.   DOI
16 Li, H., Zhang, J. and Zong, C. (2017) 'Implicit discourse relation recognition for English and Chinese with multiview modeling and effective representation learning', ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), Vol. 16, Nos. 3-19.
17 Samira lagrini, Nabiha Azizi, Mohammed Regjimi, and Monther Al Dwairi, "Toward an automatic summarisation of Arabic text depending on rhetorical relations", International journal of reasoning - based intelligent systems, Vol.11, No.3, 2019, pp. 203-214.   DOI
18 Christina Lioma, Birger larsen, Wei Lu, " Rhetorical relations for information retrieval", 35th International ACM SIGIR conference on research and development in information retrieval, USA, August 12-16, 2012.
19 Regragui, Yassir & Abouenour, Lahsen & Krieche, Fettoum & Bouzoubaa, Karim & Rosso, Paolo. (2016). Arabic WordNet: New Content and New Applications. W. C. Mann and S. A. Thompson. Rhetorical structure theory: Toward a functional theory of text organization. Text, 8:243-281, 1988.
20 B. Heerschop, F. Goossen, A. Hogenboom, F. Frasincar, U. Kaymak, and F. de Jong. Polarity analysis of texts using discourse structure. In Proceedings of the 20th ACM international conference on Information and knowledge management, CIKM '11, pages 1061-1070, New York, NY, USA, 2011.
21 P. Kingsbury and M. Palmer. From treebank to propbank. In Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC), pages, 2002.
22 Mallek Fatma, Belainine Billal and Fatiha Sadat, 2017. Arabic social Media Analysis and Translation.3rd International conference on Arabic Computational Linguistics, ACLing 2017. Dubai,united Arab Emirates.
23 Al-Ayyoub Mahmoud, Nuseir Aya , Alsmearat Khouloud, Jaraweh Yaser and Gupta Brij, 2018. Deep learning for Arabic NLP.Journal of computational science 2018, volume 26.
24 M. Biltawi, A. Awajan, and S. Tedmori, ''Evaluation of question classification,'' in Proc. 2nd Int. Conf. New Trends Comput. Sci. (ICTCS), Oct. 2019, pp. 1-7.
25 B. A. Shawar, ''A Chatbot as a natural Web Interface to Arabic Web QA,''Int. J. Emerg. Technol. Learn. (iJET), vol. 6, no. 1, pp. 37-43, 2011.   DOI
26 M. F. Al-Jouie and A. M. Azmi, ''Automated evaluation of school children essays in Arabic,'' Procedia Comput. Sci., vol. 117, pp. 19-22, 2017.   DOI
27 H. Rababah and A. T. Al-Taani, ''An automated scoring approach for Arabic short answers essay questions,'' in Proc. 8th Int. Conf. Inf. Technol. (ICIT), May 2017, pp. 697-702.
28 W. H. Gomaa and A. A. Fahmy, ''Automatic scoring for answers to Arabic test questions,'' Comput. Speech Lang., vol. 28, no. 4, pp. 833-857,Jul. 2014.   DOI