Word Embeddings-Based Pseudo Relevance Feedback Using Deep Averaging Networks for Arabic Document Retrieval |
Farhan, Yasir Hadi
(Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia)
Noah, Shahrul Azman Mohd (Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia) Mohd, Masnizah (Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia) Atwan, Jaffar (Prince Abdullah Bin Ghazi, Faculty of Information Technology, Al Balqa Applied University) |
1 | Croft, W. B., Metzler, D., & Strohman, T. (2010). Search engines: Information retrieval in practice. Addison-Wesley. |
2 | Darwish, K., & Mubarak, H. (2016, May 23-28). Farasa: A new fast and accurate Arabic word segmenter. In N. Calzolari, K. Choukri, T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, & S. Piperidis (Eds.) Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC'16) (pp. 1070-1074). European Language Resources Association. |
3 | ALMasri, M., Berrut, C., & Chevallet, J.-P. (2016, March 20-23). A comparison of deep learning based query expansion with pseudo-relevance feedback and mutual information. In N. Ferro, F. Crestani, M.-F. Moens, J. Mothe, F. Silvestri, G. M. Di Nunzio, C. Hauff, & G. Silvello (Eds.), Proceedings of the 38th European Conference on IR Research (pp. 709-715). Springer. https://doi.org/10.1007/978-3-319-30671-1_57. DOI |
4 | Alsmearat, K., Al-Ayyoub, M., & Al-Shalabi, R. (2014, November 10-13). An extensive study of the Bag-of-Words approach for gender identification of Arabic articles. In A. Bouras, Z. Tari, A. Erradi, & S. Abdelwahed (Eds.), Proceedings of the 2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (pp. 601-608). IEEE. https://doi.org/10.1109/AICCSA.2014.7073254. DOI |
5 | Azad, H. K., & Deepak, A. (2019). Query expansion techniques for information retrieval: A survey. Information Processing & Management, 56(5), 1698-1735. https://doi.org/10.1016/j.ipm.2019.05.009. DOI |
6 | Fernandez-Reyes, F. C., Hermosillo-Valadez, J., & Montes-y-Gomez, M. (2018). A prospect-guided global query expansion strategy using word embeddings. Information Processing & Management, 54(1), 1-13. https://doi.org/10.1016/j.ipm.2017.09.001. DOI |
7 | Esposito, M., Damiano, E., Minutolo, A., De Pietro, G., & Fujita, H. (2020). Hybrid query expansion using lexical resources and word embeddings for sentence retrieval in question answering. Information Sciences, 514, 88-105. https://doi.org/10.1016/j.ins.2019.12.002. DOI |
8 | Farghaly, A., & Shaalan, K. (2009). Arabic natural language processing: Challenges and solutions. ACM Transactions on Asian Language Information Processing, 8(4), 14. https://doi.org/10.1145/1644879.1644881. DOI |
9 | Farhan, Y. H., Noah, S. A. M., & Mohd, M. (2020). Survey of automatic query expansion for arabic text retrieval. Journal of Information Science Theory and Practice, 8(4), 67-86. https://doi.org/10.1633/JISTaP.2020.8.4.6. DOI |
10 | Franco-Salvador, M., Rangel, F., Rosso, P., Taule, M., & Martit, M. A. (2015, September 8-11). Language variety identification using distributed representations of words and documents. In J. Mothe, J. Savoy, J. Kamps, K. Pinel-Sauvagnat, G. Jones, E. San Juan, L. Capellato, & N. Ferro (Eds.), Proceedings of the 6th International Conference of the CLEF Association, CLEF'15 (pp. 28-40). Springer. https://doi.org/10.1007/978-3-319-24027-5_3. DOI |
11 | Turney, P. D., & Pantel, P. (2010). From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37(1), 141-188. DOI |
12 | Diaz, F., Mitra, B., & Craswell, N. (2016, August 7-12). Query expansion with locally-trained word embeddings. In K. Erk, & N. A. Smith (Eds.), Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (pp. 367-377). Association for Computational Linguistics. https://doi.org/10.18653/v1/P16-1035. DOI |
13 | El Mahdaouy, A., El Alaoui, S. O., & Gaussier, E. (2019). Word-embedding-based pseudo-relevance feedback for Arabic information retrieval. Journal of Information Science, 45(4), 429-442. https://doi.org/10.1177%2F0165551518792210. DOI |
14 | Fang, H., & Zhai, C. (2006, August 6-11). Semantic term matching in axiomatic approaches to information retrieval. In S. Dumais, E. N. Efthimiadis, D. Hawking, & K. Jarvelin (Eds.), Proceedings of the SIGIR '06: 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 115-122). Association for Computing Machinery. https://doi.org/10.1145/1148170.1148193. DOI |
15 | Pal, D., Mitra, M., & Datta, K. (2014). Improving query expansion using WordNet. Journal of the Association for Information Science and Technology, 65(12), 2469-2478. https://doi.org/10.1002/asi.23143. DOI |
16 | Roy, D., Paul, D., Mitra M., & Garain, U. (2016). Using word embeddings for automatic query expansion. Paper presented at the Neu-IR '16 SIGIR Workshop on Neural Information Retrieval, Pisa, Italy. |
17 | Larkey, L. S., Ballesteros, L., & Connell, M. E. (2002, August 11-15). Improving stemming for Arabic information retrieval: Light stemming and co-occurrence analysis. In K. Jarvelin, R. Baeza-Yates, & S. H. Myaeng (Eds.), Proceedings of the SIGIR '02: 25th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 275-282). Association for Computing Machinery. https://doi.org/10.1145/564376.564425. DOI |
18 | Ganguly, D., Roy, D., Mitra, M., & Jones, G. J. F. (2015, August 9-13). Word embedding based generalized language model for information retrieval. In R. Gonzalez-Ibanez, & N. Hidalgo (Eds.), Proceedings of the SIGIR '15: 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 795-798). Association for Computing Machinery. https://doi.org/10.1145/2766462.2767780. DOI |
19 | Iyyer, M., Manjunatha, V., & Daume, H., III. (2015, July 26-31). Deep unordered composition rivals syntactic methods for text classification. In C. Zong, & M. Strube (Eds.), Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (pp. 1681-1691). Association for Computational Linguistics. https://doi.org/10.3115/v1/P15-1162. DOI |
20 | Kim, H. K., Kim, H., & Cho, S. (2017). Bag-of-concepts: Comprehending document representation through clustering words in distributed representation. Neurocomputing, 266, 336-352. https://doi.org/10.1016/j.neucom.2017.05.046. DOI |
21 | Lavrenko, V., & Croft, W. B. (2017). Relevance-based language models. ACM SIGIR Forum, 51(2), 260-267. https://doi.org/10.1145/3130348.3130376. DOI |
22 | Manning, C. D., Raghavan, P., & Schutze, H. (2008). Introduction to information retrieval. Cambridge University Press. |
23 | Carpineto, C., & Romano, G. (2012). A survey of automatic query expansion in information retrieval. ACM Computing Surveys, 44(1), 1. https://doi.org/10.1145/2071389.2071390. DOI |
24 | Ben Guirat, S., Bounhas, I., & Slimani, Y. (2016). Combining indexing units for Arabic information retrieval. International Journal of Software Innovation, 4(4), 1-14. https://doi.org/10.4018/IJSI.2016100101. DOI |
25 | Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends® in Machine Learning, 2(1), 1-127. https://doi.org/10.1561/2200000006. DOI |
26 | Carpineto, C., De Mori, R., Romano, G., & Bigi, B. (2001). An information-theoretic approach to automatic query expansion. ACM Transactions on Information Systems, 19(1), 1-27. https://doi.org/10.1145/366836.366860. DOI |
27 | Crimp, R., & Trotman, A. (2018, December 11-12). Refining query expansion terms using query context. In B. Koopman, A. Trotman, & P. Thomas (Eds.), Proceedings of the ADCS '18: 23rd Australasian Document Computing Symposium (article no.: 12). Association for Computing Machinery. https://doi.org/10.1145/3291992.3292000. DOI |
28 | Amati, G., & Van Rijsbergen, C. J. (2002). Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Transactions on Information Systems, 20(4), 357-389. https://doi.org/10.1145/582415.582416. DOI |
29 | Dalton, J., Naseri, S., Dietz, L., & Allan, J. (2019, April 14-18). Local and global query expansion for hierarchical complex topics. In L. Azzopardi, B. Stein, N. Fuhr, P. Mayr, C. Hauff, & D. Hiemstra (Eds.), Proceedings of the 41st European Conference on IR Research, ECIR 2019 (pp. 290-303). Springer. https://doi.org/10.1007/978-3-030-15712-8_19. DOI |
30 | Aklouche, B., Bounhas, I., & Slimani, Y. (2018, November 14-16). Query expansion based on NLP and word embeddings. Paper presented at the TREC 2018, Gaithersburg, MD, USA. |
31 | Belkin, N. J., Oddy, R. N., & Brooks, H. M. (1982). Ask for information retrieval: Part II. Results of a design study. Journal of Documentation, 38(3), 145-164. https://doi.org/10.1108/eb026726. DOI |
32 | Berget, G., & Sandnes, F. E. (2015). Searching databases without query-building aids: Implications for dyslexic users. Information Research: An International Electronic Journal, 20(4), 689. |
33 | Clinchant, S., & Gaussier, E. (2013, September 29-October 2). A theoretical analysis of pseudo-relevance feedback models. In O. Kurland, D. Metzler, C. Lioma, B. Larsen, & P. Ingwersen (Eds.), Proceedings of the ICTIR '13: International Conference on the Theory of Information Retrieval (pp. 6-13). Association for Computing Machinery. https://doi.org/10.1145/2499178.2499179. DOI |
34 | Robertson, S. E., Walker, S., Jones, S., Hancock-Beaulieu, M. M., & Gatford, M. (1995). Okapi at TREC-3. Paper presented at the 3rd Text REtrieval Conference (TREC-3), Gaithersburg, MD, USA. |
35 | Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013a). Efficient estimation of word representations in vector space. https://arxiv.org/abs/1301.3781v3. |
36 | Darwish, K., & Ali, A. (2012, July 8-14). Arabic retrieval revisited: Morphological hole filling. In H. Li, C.-Y. Lin, M. Osborne, G. G. Lee, & J. C. Park (Eds.), Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (pp. 218-222). ACL. |
37 | Mukherjee, S., & Kumar, N. S. (2019, December 9-11). Duplicate question management and answer verification system. In M. Chang, R. Rajendran, Kinshuk, S. Murthy, & V. Kamat (Eds.), Proceedings of the 2019 IEEE Tenth International Conference on Technology for Education (pp. 266-267). IEEE. https://doi.org/10.1109/T4E.2019.00067. DOI |
38 | Mustafa, M., AbdAlla, H., & Suleman, H. (2008, December 2-5). Current approaches in Arabic IR: A survey. In G. Buchanan, M. Masoodian, & S. J. Cunningham (Eds.), Proceedings of the 11th International Conference on Asian Digital Libraries, ICADL 2008 (pp. 406-407). Springer. https://doi.org/10.1007/978-3-540-89533-6_57. DOI |
39 | Pennington, J., Socher, R., & Manning, C. (2014, October 25-29). GloVe: global vectors for word representation. In Y. Marton (Ed.), Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (pp. 1532-1543). Association for Computational Linguistics. https://doi.org/10.3115/v1/D14-1162. DOI |
40 | Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12(76), 2493-2537. |
41 | Vaidyanathan, R., Das, S., & Srivastava, N. (2015). A study on retrieval models and query expansion using PRF. International Journal of Scientific & Engineering Research, 6(2), 13-18. |
42 | Zamani, H., & Croft, W. B. (2016, September 12-16). Embedding-based query language models. In B. Carterette, & H. Fang (Eds.), Proceedings of the ICTIR '16: 2016 ACM International Conference on the Theory of Information Retrieval (pp. 147-156). Association for Computing Machinery. https://doi.org/10.1145/2970398.2970405. DOI |
43 | Miyanishi, T., Seki, K., & Uehara, K. (2013, October 27-November 1). Improving pseudo-relevance feedback via tweet selection. In Q. He, A. Iyengar, W. Nejdl, J. Pei, & R. Rastogi (Eds.), Proceedings of the CIKM '13: 22nd ACM international conference on Information & Knowledge Management (pp. 439-448). Association for Computing Machinery. https://doi.org/10.1145/2505515.2505701. DOI |
44 | Mohsen, G., Al-Ayyoub, M., Hmeidi, I., & Al-Aiad, A. (2018, April 3-5). On the automatic construction of an Arabic thesaurus. In M. Quwaider (Ed.), Proceedings of the 2018 9th International Conference on Information and Communication Systems (pp. 243-247). IEEE. https://doi.org/10.1109/IACS.2018.8355431. DOI |
45 | Atwan, J., Mohd, M., Rashaideh, H., & Kanaan, G. (2016). Semantically enhanced pseudo relevance feedback for Arabic information retrieval. Journal of Information Science, 42(2), 246-260. https://doi.org/10.1177%2F0165551515594722. DOI |
46 | Faqeeh, M., Abdulla, N., Al-Ayyoub, M., Jararweh, Y., & Quwaider, M. (2014, August 27-29). Cross-lingual short-text document classification for Facebook comments. In M. Younas, I. Awan, & A. Pescape (Eds.), Proceedings of the FiCloud 2014: 2nd International Conference on Future Internet of Things and Cloud (pp. 573-578). IEEE. https://doi.org/10.1109/FiCloud.2014.99. DOI |
47 | Abbache, A., Meziane, F., Belalem, G., & Belkredim, F. Z. (2016). Arabic query expansion using WordNet and association rules. International Journal of Intelligent Information Technologies, 12(3), 51-64. http://doi.org/10.4018/IJIIT.2016070104. DOI |
48 | Abu El-Khair, I. (2007). Arabic information retrieval. Annual Review of Information Science and Technology, 41(1), 505-533. https://doi.org/10.1002/aris.2007.1440410118. DOI |
49 | Takeuchi, S., Sugiura, K., Akahoshi, Y., & Zettsu, K. (2017). Spatio-temporal pseudo relevance feedback for scientific data retrieval. IEEJ Transactions on Electrical and Electronic Engineering, 12(1), 124-131. https://doi.org/10.1002/tee.22352. DOI |
50 | Trotman, A., Puurula, A., & Burgess, B. (2014, November 27-28). Improvements to BM25 and language models examined. In J. Culpepper, L. Park, & G. Zuccon (Eds.), Proceedings of the ADCS '14: 2014 Australasian Document Computing Symposium (pp. 58-65). Association for Computing Machinery. https://doi.org/10.1145/2682862.2682863. DOI |
51 | Montazeralghaem, A., Zamani, H., & Shakery, A. (2016, July 17-21). Axiomatic analysis for improving the log-logistic feedback model. In R. Perego, F. Sebastiani, J. Aslam, I. Ruthven, & J. Zobel (Eds.), Proceedings of the SIGIR '16: 39th International ACM SIGIR conference on Research and Development in Information Retrieval (pp. 765-768). Association for Computing Machinery. https://doi.org/10.1145/2911451.2914768. DOI |
52 | Xue, B., Fu, C., & Shaobin, Z. (2014, June 27-July 2). A study on sentiment computing and classification of Sina Weibo with Word2vec. In P. Chen, & H. Jain (Eds.), Proceedings of the 2014 IEEE International Congress on Big Data (pp. 358-363). IEEE. https://doi.org/10.1109/BigData.Congress.2014.59. DOI |
53 | Zuccon, G., Koopman, B., Bruza, P., & Azzopardi, L. (2015, December 8-9). Integrating and evaluating neural word embeddings in information retrieval. In L. A. F. Park, & S. Karimi (Eds.), Proceedings of the ADCS '15: 20th Australasian Document Computing Symposium (article no.: 12). Association for Computing Machinery. https://doi.org/10.1145/2838931.2838936. DOI |
54 | Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013b, December 5-10). Distributed representations of words and phrases and their compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Proceedings of the NIPS'13: 26th International Conference on Neural Information Processing Systems (pp. 3111-3119). Curran Associates. |