Arabic Text Clustering Methods and Suggested Solutions for Theme-Based Quran Clustering: Analysis of Literature |
Bsoul, Qusay
(Cybersecurity Department, Science and IT College, Irbid National University)
Abdul Salam, Rosalina (Faculty of Science and Technology, Universiti Sains Islam Malaysia) Atwan, Jaffar (Prince Abdullah Bin Ghazi Faculty of ICT, AL-Balqa Applied University) Jawarneh, Malik (Faculty for Computing Sciences, Gulf College) |
1 | Alweshah, M., Alkhalaileh, S., Albashish, D., Mafarja, M., Bsoul, Q., & Dorgham, O. (2021). A hybrid mine blast algorithm for feature selection problems. Soft Computing, 25(1), 517-534. https://doi.org/10.1007/s00500-020-05164-4. DOI |
2 | Al-Zoghby, A. M., Elshiwi, A., & Atwan, A. (2018). Semantic relations extraction and ontology learning from Arabic texts-a survey. In K. Shaalan, A. Hassanien, & F. Tolba (Eds.), Intelligent natural language processing: Trends and applications (pp. 199-225). Springer. |
3 | Atwan, J., Mohd, M., Kanaan, G., & Bsoul, Q. (2014, December 3-5). Impact of stemmer on Arabic text retrieval. In A. Jaafar, N. M. Ali, S. A. M. Noah, A. F. Smeaton, P. Bruza, Z. A. Bakar, N. Jamil, T. Mohd, & T. Sembok (Eds.), Proceedings of the 10th Asia Information Retrieval Societies Conference (pp. 314-326). Springer. |
4 | Azad, H. K., & Deepak, A. (2019). Query expansion techniques for information retrieval: A survey. Information Processing & Management, 56(5), 1698-1735. https://doi.org/10.1016/j.ipm.2019.05.009. DOI |
5 | Beirade, F., Azzoune, H., & Zegour, D. E. (2021). Semantic query for Quranic ontology. Journal of King Saud University - Computer and Information Sciences, 33(6), 753-760. https://doi.org/10.1016/j.jksuci.2019.04.005. DOI |
6 | Hearst, M. A. (1999, June 20-26). Untangling text data mining. In R. Dale, & K. Church (Eds.), Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (pp. 3-10). Association for Computational Linguistics. |
7 | Salloum, S. A., Al Hamad, A. Q., Al-Emran, M., & Shaalan, K. (2018). A survey of Arabic text mining. In K. Shaalan, A. E. Hassanien, & F. Tolba (Eds.), Intelligent natural language processing: Trends and applications (pp. 417-431). Springer. |
8 | Hartigan, J. A. (1981). Consistency of single linkage for high-density clusters. Journal of the American Statistical Association, 76(374), 388-394. https://doi.org/10.1080/01621459.1981.10477658. DOI |
9 | Guo, Y. W., Li, W. D., Mileham, A. R., & Owen, G. W. (2009). Applications of particle swarm optimisation in integrated process planning and scheduling. Robotics and ComputerIntegrated Manufacturing, 25(2), 280-288. https://doi.org/10.1016/j.rcim.2007.12.002. DOI |
10 | Harrag, F. (2014). Text mining approach for knowledge extraction in Sahîh Al-Bukhari. Computers in Human Behavior, 30, 558-566. https://doi.org/10.1016/j.chb.2013.06.035. DOI |
11 | Cambria, E., & White, B. (2014). Jumping NLP curves: A review of natural language processing research. IEEE Computational Intelligence Magazine, 9(2), 48-57. https://doi.org/10.1109/MCI.2014.2307227. DOI |
12 | Bouras, C., & Tsogkas, V. (2010, May 9-15). Assigning web news to clusters. In G. O. Bellot, H. Sasaki, M. Ehmann, & C. Dini (Eds.), Proceedings of the 5th International Conference on Internet and Web Applications and Services (pp. 1-6). Institute of Electrical and Electronics Engineers. |
13 | Bsoul, Q., Al-Shamari, E., Mohd, M., & Atwan, J. (2014, December 3-5). Distance measures and stemming impact on Arabic document clustering. In A. Jaafar, N. M. Ali, S. A. M. Noah, A. F. Smeaton, P. Bruza, Z. A. Bakar, N. Jamil, T. Mohd, & T. Sembok (Eds.), Proceedings of the 10th Asia Information Retrieval Societies Conference (pp. 327-339). Springer. |
14 | Bsoul, Q., Salim, J., & Zakaria, L. Q. (2016). Document clustering approach to detect crime. World Applied Sciences Journal, 34(8), 1026-1036. https://doi.org/10.5829/idosi.wasj.2016.34.8.109. DOI |
15 | Zhang, Y., Li, H.-G., Wang, Q., & Peng, C. (2019). A filter-based bare-bone particle swarm optimization algorithm for unsupervised feature selection. Applied Intelligence, 49(8), 2889-2898. https://doi.org/10.1007/s10489-019-01420-9. DOI |
16 | Hatamlou, A. (2012). In search of optimal centroids on data clustering using a binary search algorithm. Pattern Recognition Letters, 33(13), 1756-1760. https://doi.org/10.1016/j.patrec.2012.06.008. DOI |
17 | Yauri, A. R., Kadir, R. A., Azman, A., & Murad, M. A. A. (2013). Quranic verse extraction base on concepts using OWLDL ontology. Research Journal of Applied Sciences, Engineering and Technology, 6(23), 4492-4498. https://doi.org/10.19026/rjaset.6.3457. DOI |
18 | Zhang, J., Liu, K., Tan, Y., & He, X. (2008, June 7-11). Random black hole particle swarm optimization and its application. Proceedings of the 2008 International Conference on Neural Networks and Signal Processing (pp. 359-365). Institute of Electrical and Electronics Engineers. |
19 | Zheng, Z., Li, J., & Han, Y. (2020). An improved invasive weed optimization algorithm for solving dynamic economic dispatch problems with valve-point effects. Journal of Experimental & Theoretical Artificial Intelligence, 32(5), 805-829. https://doi.org/10.1080/0952813X.2019.1673488. DOI |
20 | Aouf, M., Liyanage, L., & Hansen, S. (2008, June 30-July 2). Review of data mining clustering techniques to analyze data with high dimensionality as applied in gene expression data. In V. C. S. Lee, J. Chen, W.-K. Ng, K.-L. Ong, & T. Y. Tan (Eds.), Proceedings of the 2008 International Conference on Service Systems and Service Managament (pp. 1-5). Institute of Electrical and Electronics Engineers. |
21 | Cagnina, L., Errecalde, M., Ingaramo, D., & Rosso, P. (2014). An efficient particle swarm optimization approach to cluster short texts. Information Sciences, 265, 36-49. https://doi.org/10.1016/j.ins.2013.12.010. DOI |
22 | Touahri, I., & Mazroui, A. (2021). Studying the effect of characteristic vector alteration on Arabic sentiment classification. Journal of King Saud University - Computer and Information Sciences, 33(7), 890-898. https://doi.org/10.1016/j.jksuci.2019.04.011. DOI |
23 | Ananiadou, S., Rea, B., Okazaki, N., Procter, R., & Thomas, J. (2009). Supporting systematic reviews using text mining. Social Science Computer Review, 27(4), 509-523. https://doi.org/10.1177/0894439309332293. DOI |
24 | Bacao, F., Lobo, V., & Painho, M. (2005, May 22-25). Self-organizing maps as substitutes for K-means clustering. In V. S. Sunderam, G. D. van Albada, P. M. A. Sloot, & J. Dongarra (Eds.), Proceedings of the 5th International Conference on Computational Science (pp. 476-483). Springer. |
25 | Campello, R. J. G. B., & Hruschka, E. R. (2006). A fuzzy extension of the silhouette width criterion for cluster analysis. Fuzzy Sets and Systems, 157(21), 2858-2875. https://doi.org/10.1016/j.fss.2006.07.006. DOI |
26 | Cisty, M. (2010). Application of the harmony search optimization in irrigation. In Z. W. Geem (Ed.), Recent advances in harmony search algorithm (pp. 123-134). Springer. |
27 | Connolly, J.-F., Granger, E., & Sabourin, R. (2012). An adaptive classification system for video-based face recognition. Information Sciences, 192, 50-70. https://doi.org/10.1016/j.ins.2010.02.026. DOI |
28 | Dai, X.-Y., Chen, Q.-C., Wang, X.-L., & Xu, J. (2010b, July 11-14). Online topic detection and tracking of financial news based on hierarchical clustering. Proceedings of the 2010 International Conference on Machine Learning and Cybernetics (pp. 3341-3346). Institute of Electrical and Electronics Engineers. |
29 | Mehra, P. S., Doja, M. N., & Alam, B. (2020). Fuzzy based enhanced cluster head selection (FBECS) for WSN. Journal of King Saud University - Science, 32(1), 390-401. https://doi.org/10.1016/j.jksus.2018.04.031. DOI |
30 | Arle, J. E., & Carlson, K. W. (2021). Medical diagnosis and treatment is NP-complete. Journal of Experimental & Theoretical Artificial Intelligence, 33(2), 297-312. https://doi.org/10.1080/0952813X.2020.1737581. DOI |
31 | Menai, M. E. B. (2014). Word sense disambiguation using evolutionary algorithms - application to Arabic language. Computers in Human Behavior, 41(C), 92-103. https://doi.org/10.1016/j.chb.2014.06.021. DOI |
32 | Mottaghinia, Z., Feizi-Derakhshi, M.-R., Farzinvash, L., & Salehpour, P. (2021). A review of approaches for topic detection in Twitter. Journal of Experimental & Theoretical Artificial Intelligence, 33(5), 747-773. https://doi.org/10.1080/0952813X.2020.1785019. DOI |
33 | Hatamlou, A. (2013). Black hole: A new heuristic optimization approach for data clustering. Information Sciences, 222, 175-184. https://doi.org/10.1016/j.ins.2012.08.023. DOI |
34 | Benabdallah, A., Abderrahim, M. A., & Abderrahim, M. E. A. (2017). Extraction of terms and semantic relationships from Arabic texts for automatic construction of an ontology. International Journal of Speech Technology, 20(2), 289-296. https://doi.org/10.1007/s10772-017-9405-5. DOI |
35 | Niknam, T., & Amiri, B. (2010). An efficient hybrid approach based on PSO, ACO and k-means for cluster analysis. Applied Soft Computing, 10(1), 183-197. https://doi.org/10.1016/j.asoc.2009.07.001. DOI |
36 | Paci, M., Nanni, L., & Severi, S. (2013). An ensemble of classifiers based on different texture descriptors for texture classification. Journal of King Saud University - Science, 25(3), 235-244. https://doi.org/10.1016/j.jksus.2012.12.001. DOI |
37 | Qasim, I., Jeong, J.-W., Heu, J.-U., & Lee, D.-H. (2013). Concept map construction from text documents using affinity propagation. Journal of Information Science, 39(6), 719-736. https://doi.org/10.1177%2F0165551513494645. DOI |
38 | Hatamlou, A., Abdullah, S., & Nezamabadi-pour, H. (2012). A combined approach for clustering based on K-means and gravitational search algorithms. Swarm and Evolutionary Computation, 6, 47-52. https://doi.org/10.1016/j.swevo.2012.02.003. DOI |
39 | Helwe, C., & Elbassuoni, S. (2019). Arabic named entity recognition via deep co-learning. Artificial Intelligence Review, 52(1), 197-215. https://doi.org/10.1007/s10462-019-09688-6. DOI |
40 | Mustafa, S. H. (2005). Character contiguity in N-gram-based word matching: The case for Arabic text searching. Information Processing and Management: An International Journal, 41(4), 819-827. https://doi.org/10.1016/j.ipm.2004.02.003. DOI |
41 | Safee, M. A. M., Saudi, M. M., Pitchay, S. A., & Ridzuan, F. (2016). A systematic review analysis for Quran verses retrieval. Journal of Engineering and Applied Sciences, 11(3), 629-634. https://doi.org/10.36478/jeasci.2016.629.634. |
42 | Qin, A. K., & Suganthan, P. N. (2004). Robust growing neural gas algorithm with application in cluster analysis. Neural Networks: The Official Journal of the International Neural Network Society, 17(8-9), 1135-1148. https://doi.org/10.1016/j.neunet.2004.06.013. DOI |
43 | Raharjo, S., Wardoyo, R., & Putra, A. E. (2020). Detecting proper nouns in indonesian-language translation of the Quran using a guided method. Journal of King Saud University - Computer and Information Sciences, 32(5), 583-591. https://doi.org/10.1016/j.jksuci.2018.06.009. DOI |
44 | Rostam, N. A. P., & Malim, N. H. A. H. (2021). Text categorisation in Quran and Hadith: Overcoming the interrelation challenges using machine learning and term weighting. Journal of King Saud University - Computer and Information Sciences, 33(6), 658-667. https://doi.org/10.1016/j.jksuci.2019.03.007. DOI |
45 | Dai, X., He, Y., & Sun, Y. (2010a, October 23-24). A two-layer text clustering approach for retrospective news event detection. In F. L. Wang, & T. Jin (Eds.), Proceedings of the Proceedings of the 2010 International Conference on Artificial Intelligence and Computational Intelligence (pp. 364-368). Institute of Electrical and Electronics Engineers. |
46 | Hatamlou, A., Abdullah, S., & Hatamlou, M. (2011, December 13-15). Data clustering using big bang-big crunch algorithm. In P. Pichappan, H. Ahmadi, & E. Ariwa (Eds.), Proceedings of the 1st International Conference on Innovative Computing Technology (pp. 383-388). Springer. |
47 | Kushwaha, N., & Pant, M. (2021). Fuzzy electromagnetic optimisation clustering algorithm for collaborative filtering. Journal of Experimental & Theoretical Artificial Intelligence, 33(4), 601-616. https://doi.org/10.1080/0952813X.2019.1647557. DOI |
48 | Tabakhi, S., Moradi, P., & Akhlaghian, F. (2014). An unsupervised feature selection algorithm based on ant colony optimization. Engineering Applications of Artificial Intelligence, 32, 112-123. https://doi.org/10.1016/j.engappai.2014.03.007. DOI |
49 | Das, S., Abraham, A., & Konar, A. (2007). Automatic clustering using an improved differential evolution algorithm. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 38(1), 218-237. https://doi.org/10.1109/TSMCA.2007.909595. DOI |
50 | Bharti, K. K., & Singh, P. K. (2014). A three-stage unsupervised dimension reduction method for text clustering. Journal of Computational Science, 5(2), 156-169. https://doi.org/10.1016/j.jocs.2013.11.007. DOI |
51 | Karaboga, D., & Ozturk, C. (2011). A novel clustering approach: Artificial Bee Colony (ABC) algorithm. Applied Soft Computing, 11(1), 652-657. https://doi.org/10.1016/j.asoc.2009.12.025. DOI |
52 | Hmeidi, I., Hawashin, B., & El-Qawasmeh, E. (2008). Performance of KNN and SVM classifiers on full word Arabic articles. Advanced Engineering Informatics, 22(1), 106-111. https://doi.org/10.1016/j.aei.2007.12.001. DOI |
53 | Izakian, H., & Abraham, A. (2011). Fuzzy C-means and fuzzy swarm for fuzzy clustering problem. Expert Systems with Applications: An International Journal, 38(3), 1835-1838. https://doi.org/10.1016/j.eswa.2010.07.112. DOI |
54 | Jain, A. K. (2010). Data clustering: 50 Years beyond K-means. Pattern Recognition Letters, 31(8), 651-666. https://doi.org/10.1016/j.patrec.2009.09.011. DOI |
55 | Mahdavi, M., & Abolhassani, H. (2009). Harmony K-means algorithm for document clustering. Data Mining and Knowledge Discovery, 18(3), 370-391. https://doi.org/10.1007/s10618-008-0123-0. DOI |
56 | Jo, T. (2009, July 28-31). Clustering news groups using inverted index based NTSO. Proceedings of the 1st International Conference on Networked Digital Technologies (pp. 1-7). Institute of Electrical and Electronics Engineers. |
57 | Kaur, A., & Sood, S. K. (2020). Cloud-Fog based framework for drought prediction and forecasting using artificial neural network and genetic algorithm. Journal of Experimental & Theoretical Artificial Intelligence, 32(2), 273-289. https://doi.org/10.1080/0952813X.2019.1647563. DOI |
58 | Kharrousheh, A., Abdullah, S., & Nazri, M. Z. A. (2011). A modified Tabu search approach for the clustering problem. Journal of Applied Sciences, 11(19), 3447-3453. https://doi.org/10.3923/jas.2011.3447.3453. DOI |
59 | Liu, R., Jiao, L., Zhang, X., & Li, Y. (2012). Gene transposon based clone selection algorithm for automatic clustering. Information Sciences: An International Journal, 204, 1-22. https://doi.org/10.1016/j.ins.2012.03.021. DOI |
60 | Mafarja, M. M., & Mirjalili, S. (2017). Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing, 260, 302-312. https://doi.org/10.1016/j.neucom.2017.04.053. DOI |
61 | Forsati, R., Mahdavi, M., Shamsfard, M., & Reza Meybodi, M. (2013). Efficient stochastic algorithms for document clustering. Information Sciences, 220, 269-291. https://doi.org/10.1016/j.ins.2012.07.025. DOI |
62 | Abualigah, L. M., & Khader, A. T. (2017). Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. The Journal of Supercomputing, 73(11), 4773-4795. https://doi.org/10.1007/s11227-017-2046-2. DOI |
63 | Abualigah, L. M., Khader, A. T., & Al-Betar, M. A. (2016, July 13-14). Unsupervised feature selection technique based on genetic algorithm for improving the text clustering. In A. Altamimi, & F. Almasalha (Eds.), Proceedings of the 7th International Conference on Computer Science and Information Technology (pp. 1-6). Institute of Electrical and Electronics Engineers. |
64 | Dueck, D., & Frey, B. J. (2007, October 14-21). Non-metric affinity propagation for unsupervised image categorization. Proceedings of the 11th International Conference on Computer Vision (pp. 1-8). Institute of Electrical and Electronics Engineers. |
65 | El Mahdaouy, A., El Alaoui Ouatik, S., & Gaussier, E. (2018). Improving Arabic information retrieval using word embedding similarities. International Journal of Speech Technology, 21(1), 121-136. https://hal.archives-ouvertes.fr/hal01706531. DOI |
66 | Elayeb, B. (2019). Arabic word sense disambiguation: A review. Artificial Intelligence Review, 52(4), 2475-2532. https://doi.org/10.1007/s10462-018-9622-6. DOI |
67 | Farhan, Y. H., Mohd, M., & Noah, S. A. M. (2020). Survey of automatic query expansion for Arabic text retrieval. Journal of Information Science Theory and Practice, 8(4), 67-86. https://doi.org/10.1633/JISTaP.2020.8.4.6. DOI |
68 | Fathian, M., Amiri, B., & Maroosi, A. (2007). Application of honey-bee mating optimization algorithm on clustering. Applied Mathematics and Computation, 190(2), 1502-1513. https://doi.org/10.1016/j.amc.2007.02.029. DOI |
69 | Gbadoubissa, J. E. Z., Ari, A. A. A., & Gueroui, A. M. (2020). Efficient K-means based clustering scheme for mobile networks cell sites management. Journal of King Saud University - Computer and Information Sciences, 32(9), 1063-1070. https://doi.org/10.1016/j.jksuci.2018.10.015. DOI |
70 | Saloot, M. A., Idris, N., Mahmud, R., Ja'afar, S., Thorleuchter, D., & Gani, A. (2016). Hadith data mining and classification: A comparative analysis. Artificial Intelligence Review, 46(1), 113-128. https://doi.org/10.1007/s10462-016-9458-x. DOI |
71 | Sayed, G. I., Darwish, A., & Hassanien, A. E. (2018). A new chaotic multi-verse optimization algorithm for solving engineering optimization problems. Journal of Experimental & Theoretical Artificial Intelligence, 30(2), 293-317. https://doi.org/10.1080/0952813X.2018.1430858. DOI |
72 | Selim, S. Z., & Ismail, M. A. (1984). K-means-type algorithms: A generalized convergence theorem and characterization of local optimality. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6(1), 81-87. https://doi.org/10.1109/TPAMI.1984.4767478. DOI |
73 | Velmurugan, T., & Santhanam, T. (2011). A survey of partition based clustering algorithms in data mining: An experimental approach. Information Technology Journal, 10(3), 478-484. https://doi.org/10.3923/itj.2011.478.484. DOI |
74 | Mesmia, F. B., Haddar, K., Friburger, N., & Maurel, D. (2018). CasANER: Arabic named entity recognition tool. In K. Shaalan, A. Hassanien, & F. Tolba (Eds.), Intelligent natural language processing: Trends and applications (pp. 173-198). Springer. |
75 | Grislin-Le Strugeon, E., Marcal de Oliveira, K., Thilliez, M., & Petit, D. (2021). A systematic mapping study on agent mining. Journal of Experimental & Theoretical Artificial Intelligence. https://doi.org/10.1080/0952813X.2020.1864784. DOI |
76 | Gungor, Z., & Unler, A. (2007). K-harmonic means data clustering with simulated annealing heuristic. Applied Mathematics and Computation, 184(2), 199-209. https://doi.org/10.1016/j.amc.2006.05.166. DOI |
77 | Senthilnath, J., Omkar, S. N., & Mani, V. (2011). Clustering using firefly algorithm: Performance study. Swarm and Evolutionary Computation, 1(3), 164-171. https://doi.org/10.1016/j.swevo.2011.06.003. DOI |
78 | Shahrivari, S., & Jalili, S. (2016). Single-pass and linear-time kmeans clustering based on MapReduce. Information Systems, 60, 1-12. https://doi.org/10.1016/j.is.2016.02.007. DOI |
79 | Wu, J., Dong, M., Ota, K., Li, J., & Guan, Z. (2018). Big data analysis-based secure cluster management for optimized control plane in software-defined networks. IEEE Transactions on Network and Service Management, 15(1), 27-38. https://doi.org/10.1109/TNSM.2018.2799000. DOI |
80 | Yahya, A, A. (2018). Centroid particle swarm optimisation for high-dimensional data classification. Journal of Experimental & Theoretical Artificial Intelligence, 30(6), 857-886. https://doi.org/10.1080/0952813X.2018.1509378. DOI |
81 | Abualkishik, A. M., Omar, K., & Odiebat, G. A. (2015). QEFSM model and Markov Algorithm for translating Quran reciting rules into Braille code. Journal of King Saud University - Computer and Information Sciences, 27(3), 238-247. https://doi.org/10.1016/j.jksuci.2015.01.001. DOI |
82 | Ahmad, R., Wazirali, R., Bsoul, Q., Abu-Ain, T., & Abu-Ain, W. (2021). Feature-selection and mutual-clustering approaches to improve DoS detection and maintain WSNs' lifetime. Sensors (Basel, Switzerland), 21(14), 4821. https://doi.org/10.3390/s21144821. DOI |
83 | Alghamdi, H. M., & Selamat, A. (2019). Arabic web page clustering: A review. Journal of King Saud University - Computer and Information Sciences, 31(1), 1-14. https://doi.org/10.1016/j.jksuci.2017.06.002. DOI |
84 | Al-Salemi, B., & Aziz, M. J. A. (2011). Statistical Bayesian learning for automatic Arabic text categorization. Journal of Computer Science, 7(1), 39-45. https://doi.org/10.3844/jcssp.2011.39.45. DOI |
85 | Al-Smadi, M., Al-Ayyoub, M., Jararweh, Y., & Qawasmeh, O. (2019). Enhancing aspect-based sentiment analysis of Arabic hotels' reviews using morphological, syntactic and semantic features. Information Processing & Management, 56(2), 308-319. https://doi.org/10.1016/j.ipm.2018.01.006. DOI |
86 | Al-Shammari, E., & Lin, J. (2008, July 24). A novel Arabic lemmatization algorithm. In D. Lopresti, S. Roy, K. Schulz, & L. V. Subramaniam (Eds.), Proceedings of the 2nd Workshop on Analytics for Noisy Unstructured Text Data (pp. 113-118). Association for Computing Machinery. |