[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3837/tiis.2021.06.006

An Artificial Intelligence Approach for Word Semantic Similarity Measure of Hindi Language

Younas, Farah (Department of Computer Science, Shaheed Zulfikar Ali Bhutto Institute of Science and Technology)
Nadir, Jumana (Computer Engineering Department, San Jose State University)
Usman, Muhammad (Department of Computer Science, Shaheed Zulfikar Ali Bhutto Institute of Science and Technology)
Khan, Muhammad Attique (Department of Computer Science, HITEC University Taxila)
Khan, Sajid Ali (5Department of Software Engineering, Foundation University)
Kadry, Seifedine (Faculty of Applied Computing and Technology, Noroff University College)
Nam, Yunyoung (Department of Computer Science and Engineering, Soonchunhyang University)

Publication Information

KSII Transactions on Internet and Information Systems (TIIS) / v.15, no.6, 2021 , pp. 2049-2068 More about this Journal

Abstract

AI combined with NLP techniques has promoted the use of Virtual Assistants and have made people rely on them for many diverse uses. Conversational Agents are the most promising technique that assists computer users through their operation. An important challenge in developing Conversational Agents globally is transferring the groundbreaking expertise obtained in English to other languages. AI is making it possible to transfer this learning. There is a dire need to develop systems that understand secular languages. One such difficult language is Hindi, which is the fourth most spoken language in the world. Semantic similarity is an important part of Natural Language Processing, which involves applications such as ontology learning and information extraction, for developing conversational agents. Most of the research is concentrated on English and other European languages. This paper presents a Corpus-based word semantic similarity measure for Hindi. An experiment involving the translation of the English benchmark dataset to Hindi is performed, investigating the incorporation of the corpus, with human and machine similarity ratings. A significant correlation to the human intuition and the algorithm ratings has been calculated for analyzing the accuracy of the proposed similarity measures. The method can be adapted in various applications of word semantic similarity or module for any other language.

Keywords

Artificial Intelligence, word similarity; semantic nets; natural language processing; corpus; synonymy;

Citations & Related Records

Reference

1	M. Sinha, M. Reddy, and P. Bhattacharyya, "An approach towards construction and application of multilingual indo-wordnet," in Proc. of 3rd Global Wordnet Conference (GWC 06), Jeju Island, Korea, 2006.
2	A. Jain, S. Vij, and O. Castillo, "Hindi Query Expansion based on Semantic Importance of Hindi WordNet Relations and Fuzzy Graph Connectivity Measures," Computacion y Sistemas, vol. 23, no. 4, pp. 1337-1355, 2019.
3	Z. Akhtar, J. W. Lee, M. A. Khan, M. Sharif, S. A. Khan, and N. Riaz, "Optical character recognition (OCR) using partial least square (PLS) based feature reduction: an application to artificial intelligence for biometric identification," Journal of Enterprise Information Management, Jul. 31 2020.
4	R. Javadzadeh, M. Zahedi, and M. RAHIMI, "Sentence similarity using weighted path and similarity matrices," Turkish Journal of Electrical Engineering & Computer Sciences, vol. 27, pp. 3779-3790, 2019. DOI
5	M. Sharif, M. A. Khan, M. Faisal, M. Yasmin, and S. L. Fernandes, "A framework for offline signature verification system: Best features selection approach," Pattern Recognition Letters, vol. 139, pp. 50-59, 2020. DOI
6	T. Saba, M. Bashardoost, H. Kolivand, M. S. M. Rahim, A. Rehman, and M. A. Khan, "Enhancing fragility of zero-based text watermarking utilizing effective characters list," Multimedia Tools and Applications, vol. 79, pp. 341-354, 2020. DOI
7	H. Rubenstein and J. B. Goodenough, "Contextual correlates of synonymy," Communications of the ACM, vol. 8, pp. 627-633, 1965. DOI
8	N. Adel, K. Crockett, A. Crispin, J. P. Carvalho, and D. Chandran, "Human Hedge Perception- and its Application in Fuzzy Semantic Similarity Measures," in Proc. of IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 1-7, 2019.
9	I. Y. Junghare, "'Avatar': Heavenly Descent," International Journal of Diversity in Organisations, Communities & Nations, vol. 11, no. 4, pp. 121-137, 2012.
10	Taieb, Mohamed Ali Hadj, Torsten Zesch, and Mohamed Ben Aouicha, "A survey of semantic relatedness evaluation datasets and procedures," Artificial Intelligence Review, vol. 53, no. 6, 4407-4448, 2020. DOI
11	S. Kopp, L. Gesellensetter, N. C. Kramer, and I. Wachsmuth, "A conversational agent as museum guide-design and evaluation of a real-world application," in Proc. of International workshop on intelligent virtual agents, pp. 329-343, 2005.
12	J. O'shea, Z. Bandar, and K. Crockett, "A new benchmark dataset with production methodology for short text semantic similarity algorithms," ACM Transactions on Speech and Language Processing (TSLP), vol. 10, pp. 1-63, 2014.
13	Pandit, Rajat, Saptarshi Sengupta, Sudip Kumar Naskar, Niladri Sekhar Dash, and Mohini Mohan Sardar, "Improving Semantic Similarity with Cross-Lingual Resources: A Study in Bangla-A Low Resourced Language," Informatics, vol. 6, no. 2, p. 19, 2019. DOI
14	D. W. Otter, J. R. Medina, and J. K. Kalita, "A survey of the usages of deep learning for natural language processing," IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 2, pp. 604-624, 2021. DOI
15	V. Sorin, Y. Barash, E. Konen, and E. Klang, "Deep Learning for Natural Language Processing in Radiology-Fundamentals and a Systematic Review," Journal of the American College of Radiology, vol. 17, no. 5, pp. 639-648, 2020. DOI
16	J. Guo, H. He, T. He, L. Lausen, M. Li, H. Lin, et al., "Gluoncv and gluonnlp: Deep learning in computer vision and natural language processing," Journal of Machine Learning Research, vol. 21, pp. 1-7, 2020.
17	I. Atoum, "A novel framework for measuring software quality-in-use based on semantic similarity and sentiment analysis of software reviews," Journal of King Saud University-Computer and Information Sciences, vol. 32, pp. 113-125, 2020. DOI
18	S. H. Wasti, M. J. Hussain, G. Huang, A. Akram, Y. Jiang, and Y. Tang, "Assessing semantic similarity between concepts: A weighted-feature-based approach," Concurrency and Computation: Practice and Experience, vol. 32, no. 7, p. e5594, 2020. DOI
19	R. Mihalcea, C. Corley, and C. Strapparava, "Corpus-based and knowledge-based measures of text semantic similarity," Aaai, vol. 6, no. 2006, pp. 775-780, 2006.
20	H. K. Azad and A. Deepak, "A novel model for query expansion using pseudo-relevant web knowledge," arXiv preprint arXiv:1908.10193, 2019.
21	F. E. Batool, M. Attique, M. Sharif, K. Javed, M. Nazir, A. A. Abbasi, et al., "Offline signature verification system: a novel technique of fusion of GLCM and geometric features using SVM," Multimedia Tools and Applications, pp. 1-20, Apr. 2020.
22	G. Jain and D. Lobiyal, "Word Sense Disambiguation of Hindi Text using Fuzzified Semantic Relations and Fuzzy Hindi WordNet," in Proc. of 9th Int. Conf. on Cloud Computing, Data Science & Engineering (Confluence), pp. 494-497, 2019.
23	Y. Li, Z. A. Bandar, and D. McLean, "An approach for measuring semantic similarity between words using multiple information sources," IEEE Transactions on knowledge and data engineering, vol. 15, pp. 871-882, 2003.. DOI
24	Sebastiani, Fabrizio, "Machine learning in automated text categorization," ACM computing surveys (CSUR), vol. 34, no. 1, pp. 1-47, 2002. DOI
25	F. Smarandache, M. Colhon, S. Vladutescu, and X. Negrea, "Word-level neutrosophic sentiment similarity," Applied Soft Computing, vol. 80, pp. 167-176, 2019. DOI
26	S. Badodekar, "Translation resources, services and tools for Indian languages," Computer Science and Engineering Department, Indian Institute of Technology, Mumbai, vol. 400019, 2003.
27	R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, et al., "Visual genome: Connecting language and vision using crowdsourced dense image annotations," International Journal of Computer Vision, vol. 123, no. 1, pp. 32-73, 2017. DOI
28	N. Mishra and A. Mishra, "Part of speech tagging for Hindi corpus," in Proc. of Intl. Conf. Communication Systems and Network Technologies (CSNT), pp. 554-558, 2011.
29	M. Choice and P.M.D.M. Power, "India Human Development Survey," 2016.