DOI QR코드

DOI QR Code

KAB: Knowledge Augmented BERT2BERT Automated Questions-Answering system for Jurisprudential Legal Opinions

  • Alotaibi, Saud S. (Department of Information Systems, Umm Al-Qura University) ;
  • Munshi, Amr A. (Department of Information Systems, Umm Al-Qura University) ;
  • Farag, Abdullah Tarek (Capiter) ;
  • Rakha, Omar Essam (Faculty of Engineering, Ain Shams University) ;
  • Al Sallab, Ahmad A. (Faculty of Engineering, Cairo University) ;
  • Alotaibi, Majid (Department of Computer Engineering, Umm Al-Qura University)
  • Received : 2022.06.05
  • Published : 2022.06.30

Abstract

The jurisprudential legal rules govern the way Muslims react and interact to daily life. This creates a huge stream of questions, that require highly qualified and well-educated individuals, called Muftis. With Muslims representing almost 25% of the planet population, and the scarcity of qualified Muftis, this creates a demand supply problem calling for Automation solutions. This motivates the application of Artificial Intelligence (AI) to solve this problem, which requires a well-designed Question-Answering (QA) system to solve it. In this work, we propose a QA system, based on retrieval augmented generative transformer model for jurisprudential legal question. The main idea in the proposed architecture is the leverage of both state-of-the art transformer models, and the existing knowledge base of legal sources and question-answers. With the sensitivity of the domain in mind, due to its importance in Muslims daily lives, our design balances between exploitation of knowledge bases, and exploration provided by the generative transformer models. We collect a custom data set of 850,000 entries, that includes the question, answer, and category of the question. Our evaluation methodology is based on both quantitative and qualitative methods. We use metrics like BERTScore and METEOR to evaluate the precision and recall of the system. We also provide many qualitative results that show the quality of the generated answers, and how relevant they are to the asked questions.

Keywords

Acknowledgement

The authors extend their appreciation to the Deputyship for Research & Innovation, Ministry of Education in Saudi Arabia for funding their research work through the project number 20-UQU-IF-P3-001.

References

  1. D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," arXiv Prepr. arXiv1409.0473, 2014.
  2. M.-T. Luong, H. Pham, and C. D. Manning, "Effective Approaches to Attention-based Neural Machine Translation," Aug. 2015, Accessed: Aug. 09, 2018. [Online]. Available: http://arxiv.org/abs/1508.04025.
  3. A. Vaswani et al., "Attention is all you need," arXiv Prepr. arXiv1706.03762, 2017.
  4. B. Hamoud and E. Atwell, "Quran question and answer corpus for data mining with WEKA," in 2016 Conference of Basic Sciences and Engineering Studies (SGCAC), 2016, pp. 211-216.
  5. M. T. Sihotang, I. Jaya, A. Hizriadi, and S. M. Hardi, "Answering Islamic Questions with a Chatbot using Fuzzy String-Matching Algorithm," in Journal of Physics: Conference Series, 2020, vol. 1566, no. 1, p. 12007. https://doi.org/10.1088/1742-6596/1566/1/012007
  6. A. Abdi, S. Hasan, M. Arshi, S. M. Shamsuddin, and N. Idris, "A question answering system in hadith using linguistic knowledge," Comput. Speech \& Lang., vol. 60, p. 101023, 2020. https://doi.org/10.1016/j.csl.2019.101023
  7. B. Athiwaratkun, A. G. Wilson, and A. Anandkumar, "Probabilistic fasttext for multi-sense word embeddings," arXiv Prepr. arXiv1806.02901, 2018.
  8. M. E. Peters et al., "Deep contextualized word representations," arXiv Prepr. arXiv1802.05365, 2018.
  9. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv Prepr. arXiv1810.04805, 2018.
  10. A. Al-sallab, R. Baly, H. Hajj, K. B. Shaban, W. El-hajj, and G. Badaro, "AROMA : A Recursive Deep Learning Model for Opinion Mining in Arabic as a Low Resource Language," vol. 16, no. 4, 2017.
  11. W. Antoun, F. Baly, and H. Hajj, "AraBERT: Transformer-based Model for Arabic Language Understanding," Feb. 2020, Accessed: Jul. 05, 2021. [Online]. Available: http://arxiv.org/abs/2003.00104.
  12. M. Djandji, F. Baly, H. Hajj, and others, "Multi-Task Learning using AraBert for Offensive Language Detection," in Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, 2020, pp. 97-101.
  13. A. M. Abu Nada, E. Alajrami, A. A. Al-Saqqa, and S. S. Abu-Naser, "Arabic Text Summarization Using AraBERT Model Using Extractive Text Summarization Approach," 2020.
  14. J. Howard and S. Ruder, "Universal language model finetuning for text classification," arXiv Prepr. arXiv1801.06146, 2018.
  15. C. Chen et al., "bert2BERT: Towards Reusable Pretrained Language Models," Oct. 2021, Accessed: Jan. 31, 2022. [Online]. Available: http://arxiv.org/abs/2110.07143.
  16. T. Naous, W. Antoun, R. A. Mahmoud, and H. Hajj, "Empathetic BERT2BERT Conversational Model: Learning Arabic Language Generation with Little Data," Mar. 2021, Accessed: Jan. 29, 2022. [Online]. Available: https://arxiv.org/abs/2103.04353.
  17. P. Lewis et al., "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks," May 2020, Accessed: Jan. 31, 2022. [Online]. Available: http://arxiv.org/abs/2005.11401.
  18. A. B. Soliman, K. Eissa, and S. R. El-Beltagy, "Aravec: A set of arabic word embedding models for use in arabic nlp," Procedia Comput. Sci., vol. 117, pp. 256-265, 2017. https://doi.org/10.1016/j.procs.2017.10.117
  19. W. Antoun, F. Baly, and H. Hajj, "Arabert: Transformer-based model for arabic language understanding," arXiv Prepr. arXiv2003.00104, 2020.
  20. AiIftaSA, "alifta," https://www.alifta.gov.sa.
  21. DarAlIftaEG, "Dar-al-ifta," https://www.daralifta.org/ar/Default.aspx?sec=fatwa&1&Home=1.
  22. AlIftaJO, "alifta-jo," https://aliftaa.jo/.
  23. Islamway, "islamway," https://ar.islamway.net/fatawa/source/.
  24. Islamweb, "islamweb," https://www.islamweb.net/ar/.
  25. AskFM98k, "askfm98k," https://omarito.me/arabic-askfmdataset/.
  26. Islamonline, "islamonline," https://islamonline.net/.
  27. Binbaz, "binbaz," https://binbaz.org.sa/fatwas/kind/1.
  28. Binothaimeen, "binothaimeen," https://binothaimeen.net/site.
  29. Islamqa, "islamqa," https://islamqa.info/.
  30. S. Banerjee and A. Lavie, "METEOR: An automatic metric for MT evaluation with improved correlation with human judgments," in Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, 2005, pp. 65-72.
  31. T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi, "BERTScore: Evaluating Text Generation with BERT," Apr. 2019, Accessed: Feb. 02, 2022. [Online]. Available: https://arxiv.org/abs/1904.09675.