Browse > Article
http://dx.doi.org/10.1633/JISTaP.2020.8.4.6

Survey of Automatic Query Expansion for Arabic Text Retrieval  

Farhan, Yasir Hadi (Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia)
Noah, Shahrul Azman Mohd (Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia)
Mohd, Masnizah (Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia)
Publication Information
Journal of Information Science Theory and Practice / v.8, no.4, 2020 , pp. 67-86 More about this Journal
Abstract
Information need has been one of the main motivations for a person using a search engine. Queries can represent very different information needs. Ironically, a query can be a poor representation of the information need because the user can find it difficult to express the information need. Query Expansion (QE) is being popularly used to address this limitation. While QE can be considered as a language-independent technique, recent findings have shown that in certain cases, language plays an important role. Arabic is a language with a particularly large vocabulary rich in words with synonymous shades of meaning and has high morphological complexity. This paper, therefore, provides a review on QE for Arabic information retrieval, the intention being to identify the recent state-of-the-art of this burgeoning area. In this review, we primarily discuss statistical QE approaches that include document analysis, search, browse log analyses, and web knowledge analyses, in addition to the semantic QE approaches, which use semantic knowledge structures to extract meaningful word relationships. Finally, our conclusion is that QE regarding the Arabic language is subjected to additional investigation and research due to the intricate nature of this language.
Keywords
information retrieval; Arabic text retrieval; query expansion;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Berget, G., & Sandnes, F. E. (2015). Searching databases without query-building aids: Implications for dyslexic users. Information Research: An International Electronic Journal, 20(4), n4.
2 Bhogal, J., MacFarlane, A., & Smith, P. (2007). A review of ontology based query expansion. Information Processing & Management, 43(4), 866-886.   DOI
3 Bialecki, A., Muir, R., & Ingersoll, G. (2012, August 20). Apache Lucene 4. Proceedings of the SIGIR 2012 Workshop on Open Source Information Retrieval (pp. 17-24). Department of Computer Science, University of Otago.
4 Bounhas, I., Soudani, N., & Slimani, Y. (2020). Building a morpho-semantic knowledge graph for Arabic information retrieval. Information Processing & Management, 57(6), 102124.   DOI
5 Carpineto, C., de Mori, R., Romano, G., & Bigi, B. (2001). An information-theoretic approach to automatic query expansion. ACM Transactions on Information Systems, 19(1), 1-27.   DOI
6 Carpineto, C., & Romano, G. (2012). A survey of automatic query expansion in information retrieval. ACM Computing Surveys, 44(1), 1.   DOI
7 Clinchant, S., & Gaussier, E. (2013, September 29-October 2). A theoretical analysis of pseudo-relevance feedback models. Proceedings of the 2013 Conference on the Theory of Information Retrieval (pp. 6-13). Association for Computing Machinery.
8 Collins-Thompson, K. (2008, December 15-17). Estimating robust query models with convex optimization. Neural Information Processing Systems 21 (NIPS 2008). Curran Associates Inc.
9 Cui, H., Wen, J. -R., Nie, J. -Y., & Ma, W. -Y. (2002, May 7). Probabilistic query expansion using query logs. Proceedings of the 11th international conference on World Wide Web (pp. 325-332). Association for Computing Machinery.
10 Croft, W. B., & Harper, D. J. (1979). Using probabilistic models of document retrieval without relevance information. Journal of Documentation, 35(4), 285-295.   DOI
11 Dalton, J., Naseri, S., Dietz, L., & Allan, J. (2019, April 14-18). Local and global query expansion for hierarchical complex topics. European Conference on Information Retrieval (ECIR 2019) (pp. 290-303). Springer.
12 Diaz, F., Mitra, B., & Craswell, N. (2016, August 7-12). Query expansion with locally-trained word embeddings. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (pp. 367-377). Association for Computational Linguistics.
13 El Mahdaouy, A., El Alaoui, S. O., & Gaussier, E. (2016, October 24-26). Semantically enhanced term frequency based on word embeddings for Arabic information retrieval. 2016 4th IEEE International Colloquium on Information Science and Technology (CiSt). IEEE.
14 El Mahdaouy, A., El Alaoui, S. O., & Gaussier, E. (2018). Improving Arabic information retrieval using word embedding similarities. International Journal of Speech Technology, 21, 121-136.   DOI
15 Haddad, B. (2013). Cognitive aspects of a statistical language model for Arabic based on associative probabilistic Root-PATtern relations: A-APRoPAT. Infocommunications Journal, 2013(3).
16 El Mahdaouy, A., El Alaoui, S. O., & Gaussier, E. (2019). Word-embedding-based pseudo-relevance feedback for Arabic information retrieval. Journal of Information Science, 45(4), 429-442.   DOI
17 Fang, H., & Zhai, C. (2006, August 2). Semantic term matching in axiomatic approaches to information retrieval. Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 115-122). Association for Computing Machinery.
18 Farghaly, A., & Shaalan, K. (2009). Arabic natural language processing: Challenges and solutions. ACM Transactions on Asian Language Information Processing, 8(4), 14.
19 Farrar, D., & Hayes, J. H. (2019, May 27-27). A comparison of stemming techniques in tracing. 2019 IEEE/ACM 10th International Symposium on Software and Systems Traceability (SST). IEEE
20 Gruber, T. R. (1995). Toward principles for the design of ontologies used for knowledge sharing? International Journal of Human-Computer Studies, 43(5-6), 907-928.   DOI
21 Hammo, B., Sleit, A., & El-Haj, M. (2007, May 7-9). Effectiveness of query expansion in searching the Holy Quran. The Second International Conference on Arabic Language Processing CITALA'07 (pp. 7-10). UNSPECIFIED.
22 Han, L., & Chen, G. (2009). HQE: A hybrid method for query expansion. Expert Systems with Applications, 36(4), 7985-7991.   DOI
23 Harris, Z. (1968). Mathematical structures of language. New York: John Wiley and Sons.
24 Hasanain, M., Suwaileh, R., Elsayed, T., Kutlu, M., & Almerekhi, H. (2018). EveTAR: Building a large-scale multi-task test collection over Arabic tweets. Information Retrieval Journal, 21(4), 307-336.   DOI
25 Jiang, J. J., & Conrath, D. W. (1996). A concept-based approach to retrieval from an electronic industrial directory. International Journal of Electronic Commerce, 1(1), 51-72.   DOI
26 Hassan, A. K. A., & Hadi, M. J. (2017). Automatic query expansion for Arabic text retrieval. Iraqi Journal of Science, 58(4C), 2447-2457.
27 Hattab, M., Haddad, B., Yaseen, M., Duraidi, A., & Shmais, A. A. (2009, April 21-23). Addaall Arabic search engine: Improving search based on combination of morphological analysis and generation considering semantic patterns. 2nd International Conference on Arabic Language Resources and Tools (pp. 159-162). MEDAR consortium.
28 He, D., & Wang, J. (2009). Cross-language information retrieval. In A. Goker, & J. Davies (Eds.), Information retrieval: Searching in the 21st century (pp. 233-254). Wiley Telecom.
29 Khafajeh, H., Yousef, N., & Kanaan, G. (2010, April 12-13). Automatic query expansion for Arabic text retrieval based on association and similarity thesaurus. European, Mediterranean & Middle East Conference on Information Systems (EMCIS). Brunel University.
30 Khoja, S. (2001, June 2-7). APT: Arabic part-of-speech tagger. Proceedings of the Student Workshop at the Second Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL2001). Carnegie Mellon University.
31 Kusner, M., Sun, Y., Kolkin, N., & Weinberger, K. (2015, July 7-9). From word embeddings to document distances. Proceedings of the 32nd International Conference on International Conference on Machine Learning (pp. 957-966). JMLR.org.
32 Linden, K., & Piitulainen, J. (2004, August 29). Discovering synonyms and other related words. Proceedings of CompuTerm 2004: 3rd International Workshop on Computational Terminology (pp. 63-70). COLING.
33 Kuzi, S., Shtok, A., & Kurland, O. (2016, October 24-28). Query expansion using word embeddings. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (pp. 1929-1932). Association for Computing Machinery.
34 Larkey, L. S., Ballesteros, L., & Connell, M. E. (2007). Light stemming for Arabic information retrieval. In A. Soudi, A. van den Bosch, & G. Neumann (Eds.), Arabic computational morphology (pp. 221-243). Springer.
35 Larkey, L. S., & Connell, M. E. (2001, November 13-16). Arabic Information Retrieval at UMass in TREC-10. Paper presented at the Tenth Text REtrieval Conference (TREC 2001), Gaithersburg, MA, USA.
36 Lee, K. S., Croft, W. B., & Allan, J. (2008, July 27-28). A cluster-based resampling method for pseudo-relevance feedback. Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 235-242). Association for Computing Machinery.
37 Lin, J., & Murray, G. C. (2005, August 9-11). Assessing the term independence assumption in blind relevance feedback. Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 635-636). Association for Computing Machinery.
38 Mahgoub, A., Rashwan, M., Raafat, H., Zahran, M., & Fayek, M. (2014, October 25). Semantic query expansion for Arabic information retrieval. Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP) (pp. 87-92). Association for Computational Linguistics.
39 Mandala, R., Takenobu, T., & Hozumi, T. (1998, August 16). The use of WordNet in information retrieval. Usage of WordNet in Natural Language Processing Systems (pp. 31-37). COLING.
40 Maryamah, M., Arifin, A. Z., Sarno, R., & Morimoto, Y. (2019). Query expansion based on Wikipedia word embedding and BabelNet method for searching Arabic documents. International Journal of Intelligent Engineering & System, 12(5), 202-213.
41 Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013, December 5-10). Distributed representations of words and phrases and their compositionality. Proceedings of the 26th International Conference on Neural Information Processing Systems (pp. 3111-3119). Curran Associates Inc.
42 Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to WordNet: An on-line lexical database. International Journal of Lexicography, 3(4), 235-244.   DOI
43 Navigli, R., & Ponzetto, S. P. (2012). BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193, 217-250.   DOI
44 Nwesri, A. F. A., & Alyagoubi, H. A. (2015, September 1-4). Applying Arabic stemming using query expansion. 2015 26th International Workshop on Database and Expert Systems Applications (DEXA). IEEE.
45 Ooi, J., Ma, X., Qin, H., & Liew, S. (2015, August 19-21). A survey of query expansion, query suggestion and query refinement techniques. 2015 4th International Conference on Software Engineering and Computer Systems (ICSECS). IEEE.
46 Pal, D., Mitra, M., & Datta, K. (2014). Improving query expansion using WordNet. Journal of the Association for Information Science and Technology, 65(12), 2469-2478.   DOI
47 Pennington, J., Socher, R., & Manning, C. (2014, October 25-29). GloVe: Global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532-1543). Association for Computational Linguistics.
48 Abbache, A., Meziane, F., Belalem, G., & Belkredim, F. Z. (2016b). Arabic query expansion using WordNet and association rules. Information retrieval and management: Concepts, methodologies, tools, and applications (pp. 1239-1254). IGI Global.
49 Pinto, F. J., Martinez, A. F., & Perez-Sanjulian, C. F. (2008). Joining automatic query expansion based on thesaurus and word sense disambiguation using WordNet. International Journal of Computer Applications in Technology, 33(4), 271-279.   DOI
50 Abbache, A., Meziane, F., Belalem, G., & Belkredim, F. Z. (2016a). Arabic query expansion using WordNet and association rules. International Journal of Intelligent Information Technologies, 3(12), 4.
51 Abdelali, A. (2004). Localization in modern standard Arabic. Journal of the American Society for Information Science and Technology, 55(1), 23-28.   DOI
52 Abderrahim, M. A., Dib, M., Abderrahim, M. E. -A., & Chikh, M. A. (2016). Semantic indexing of Arabic texts for information retrieval system. International Journal of Speech Technology, 19(2), 229-236.   DOI
53 Abouenour, L., Bouzouba, K., & Rosso, P. (2010). An evaluated semantic query expansion and structure-based approach for enhancing Arabic question/answering. International Journal on Information and Communication Technologies, 3(3), 37-51.
54 Abu-Errub, A. (2014). Arabic text classification algorithm using TFIDF and Chi Square measurements. International Journal of Computer Applications, 93(6), 40-45.   DOI
55 Raza, M. A., Mokhtar, R., & Ahmad, N. (2018). A survey of statistical approaches for query expansion. Knowledge and Information Systems, 61, 1-25.   DOI
56 Possas, B., Ziviani, N., Meira Jr, W., & Ribeiro-Neto, B. (2005). Set-based vector model: An efficient approach for correlation-based ranking. ACM Transactions on Information Systems, 23(4), 397-429.   DOI
57 Qiu, Y., & Frei, H. -P. (1993, July 20-22). Concept based query expansion. Proceedings of the 16th annual international ACM SIGIR conference on Research and Development in Information Retrieval (pp. 160-169). Association for Computing Machinery.
58 Rahman, M. M., Hisamoto, S., & Duh, K. (2019). Query Expansion for Cross-Language Question Re-Ranking. CoRR, abs/1904.07982.
59 Raza, M. A., Mokhtar, R., Ahmad, N., Pasha, M., & Pasha, U. (2019). A taxonomy and survey of semantic approaches for query expansion. IEEE Access, 7, 17823-17833.   DOI
60 Shaalan, K., Al-Sheikh, S., & Oroumchian, F. (2012, October 12-15). Query expansion based-on similarity of terms for improving Arabic information retrieval. 7th International Conference on Intelligent Information Processing (IIP) (pp. 167-176). Springer.
61 Sharma, D., Pamula, R., & Chauhan, D. S. (2019). A hybrid evolutionary algorithm based automatic query expansion for enhancing document retrieval system. Journal of Ambient Intelligence and Humanized Computing, 1-20.
62 Singh, J., & Sharan, A. (2017). A new fuzzy logic-based query expansion model for efficient information retrieval using relevance feedback approach. Neural Computing and Applications, 28(9), 2557-2580.   DOI
63 Sordoni, A., Bengio, Y., & Nie, J. -Y. (2014, July 27-31). Learning concept embeddings for query expansion by quantum entropy minimization. Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence (pp. 1586-1592). AAAI Press.
64 Alshalan, S., Alshalan, R., Al-Khalifa, H., Suwaileh, R., & Elsayed, T. (2020, November 7-9). Improving Arabic microblog retrieval with distributed representations. The Information Retrieval Technology: 15th Asia Information Retrieval Societies Conference, AIRS 2019 (pp. 185-194). Springer.
65 Al-Chalabi, H., Ray, S., & Shaalan, K. (2015, April 17-20). Semantic based query expansion for Arabic question answering systems. 2015 First International Conference on Arabic Computational Linguistics (ACLing). IEEE.
66 Al-Ghuribi, S. M., & Noah, S. A. M. (2019). Multi-criteria review-based recommender system-the state of the art. IEEE Access, 7, 169446-169468.   DOI
67 Aljlayl, M., & Frieder, O. (2002, November 6-8). On Arabic search: Improving the retrieval effectiveness via a light stemming approach. Proceedings of the Eleventh International Conference on Information and Knowledge Management (pp. 340-347). Association for Computing Machinery.
68 ALMarwi, H., Ghurab, M., & Al-Baltah, I. (2020). A hybrid semantic query expansion approach for Arabic information retrieval. Journal of Big Data, 7(1), 39.   DOI
69 ALMasri, M., Berrut, C., & Chevallet, J. -P. (2016, March 20-23). A comparison of deep learning based query expansion with pseudo-relevance feedback and mutual information. ECIR 2016: Advances in Information Retrieval (pp. 709-715). Springer.
70 Amati, G., & Van Rijsbergen, C. J. (2002). Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Transactions on Information Systems, 20(4), 357-389.   DOI
71 Atwan, J., & Mohd, M. (2017). Arabic query expansion: A review. Asian Journal of Information Technology, 16(10), 754-770.
72 Wang, X., Lai, G., & Liu, C. (2009). Recovering relationships between documentation and source code based on the characteristics of software engineering. Electronic Notes in Theoretical Computer Science, 243, 121-137.   DOI
73 Trad, R., Koroni, R., Mustafa, H., & Almaghrabi, A. (2012, November 21-23). Evaluating Arabic WordNet Ontology by expansion of Arabic queries using various retrieval models. 2012 10th International Conference on ICT and Knowledge Engineering. IEEE.
74 Vechtomova, O. (2009). Query expansion for information retrieval. In L. Liu, M. T. Ozsu (Eds.), Encyclopedia of database systems (pp. 2254-2257). Boston: Springer.
75 Vulic, I., & Moens, M. -F. (2015, August 9-13). Monolingual and cross-lingual information retrieval models based on (bilingual) word embeddings. Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 363-372). Association for Computing Machinery.
76 White, R. W., & Horvitz, E. (2015). Belief dynamics and biases in web search. ACM Transactions on Information Systems, 33(4), 18.
77 Zaiane, O. R., & Antonie, M. -L. (2002, January 13). Classifying text documents by associating terms with text categories. Proceedings of the 13th Australasian Computer Science Communications (pp. 215-222). Australian Computer Society.
78 Zamani, H., & Croft, W. B. (2016, September 12-16) Embedding-based query language models. Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval (pp. 147-156). Association for Computing Machinery.
79 Baeza-Yates, R., de Vries, A. P., Zaragoza, H., Cambazoglu, B. B., Murdock, V., Lempel, R., & Silvestri, F. (2012, April 1-5). Advances in information retrieval. 34th European Conference on IR Research, ECIR 2012. Springer-Verlag Berlin Heidelberg.
80 Azad, H. K., & Deepak, A. (2019). Query expansion techniques for information retrieval: A survey. Information Processing & Management, 56(5), 1698-1735.   DOI
81 Belkredim, F. Z., & El Sebai, A. (2009). An ontology based formalism for the Arabic language using verbs and their derivatives. Communications of the IBIMA, 11(5), 44-52.
82 Zhou, G., He, T., Zhao, J., & Hu, P. (2015, July 26-31). Learning continuous word embedding with metadata for question retrieval in community question answering. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (pp. 250-259). Association for Computational Linguistics.
83 Ballesteros, L., & Croft, W. B. (1997, July 10). Phrasal translation and query expansion techniques for cross-language information retrieval. Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery.
84 Batita, M. A., & Zrigui, M. (2018, January 8-12). Derivational relations in Arabic Wordnet. The 9th Global WordNet Conference GWC. Nanyang Technological University.
85 Beirade, F., Azzoune, H., & Zegour, D. E. (2019). Semantic query for Quranic ontology. Journal of King Saud University-Computer and Information Sciences, 31(2), 135-274.   DOI
86 Belkin, N. J. (1980). Anomalous states of knowledge as a basis for information retrieval. Canadian Journal of Information Science, 5(1), 133-143.
87 Zuccon, G., Koopman, B., Bruza, P., & Azzopardi, L. (2015, December 8-9). Integrating and evaluating neural word embeddings in information retrieval. Proceedings of the 20th Australasian Document Computing Symposium (pp. 1-8). Association for Computing Machinery.