Browse > Article
http://dx.doi.org/10.9708/jksci.2020.25.02.123

Performance Evaluations of Text Ranking Algorithms  

Kim, Myung-Hwi (Dept. of Computer Science, Sangmyung University)
Jang, Beakcheol (Dept. of Computer Science, Sangmyung University)
Abstract
The text ranking algorithm is a representative method for keyword extraction, and its importance is emphasized highly. In this paper, we compare the performance of recent research and experiments with TF-IDF, SMART, INQUERY and CCA algorithms, which are used in text ranking algorithm.. After explaining each algorithm, we compare the performance of each algorithm based on the data collected from news and Twitter. Experimental results show that all of four algorithms can extract specific words from news data equally. However, in the case of Twitter, CCA has the best performance to extract specific words, and INQUERY shows the worst performance. We also analyze the accuracy of the algorithm through six comparison metrics. The experimental results present that CCA shows the best accuracy in the news data. In case of Twitter, TF-IDF and CCA show similar performance and demonstrate good performance.
Keywords
Text ranking algorithm; TF-IDF; SMART; INQUERY; CCA;
Citations & Related Records
연도 인용수 순위
  • Reference
1 B. Lott, "Survey of keyword extraction techniques," UNM Education, vol. 50, pp. 1-11, 2012.
2 Y. K. Meena, A. Jain, and D. Gopalani, "Survey on graph and cluster based approaches in multi-document text summarization," in International Conference on Recent Advances and Innovations in Engineering (ICRAIE-2014), 2014, pp. 1-5.
3 S. S. Sonawane and P. A. Kulkarni, "Graph based representation and analysis of text document: A survey of techniques," International Journal of Computer Applications, vol. 96, no. 19, 2014.
4 Ramos, "Using tf-idf to determine word relevance in document queries," in Proceedings of the first instructional conference on machine learning, 2003, vol. 242, pp. 133-142.
5 C. Buckley, G. Salton, J. Allan, and A. Singhal, "Automatic query expansion using SMART: TREC 3," NIST special publication sp, pp. 69-69, 1995.
6 J. P. Callan, W. B. Croft, and S. M. Harding, "The INQUERY retrieval system," in Database and expert systems applications, 1992, pp. 78-83.
7 H. M. de Almeida, M. A. Goncalves, M. Cristo, and P. Calado, "A combined component approach for finding collection-adapted ranking functions based on genetic programming," in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, 2007, pp. 399-406.
8 D. G. Fisher and P. Hoffman, "The adjusted Rand statistic: A SAS macro," Psychometrika, vol. 53, no. 3, pp. 417-423, 1988.   DOI
9 S. Niwattanakul, J. Singthongchai, E. Naenudorn, and S. Wanapu, "Using of Jaccard coefficient for keywords similarity," in Proceedings of the international multiconference of engineers and computer scientists, 2013, vol. 1, pp. 380-38
10 M. Halkidi, Y. Batistakis, and M. Vazirgiannis, "On clustering validation techniques," Journal of intelligent information systems, vol. 17, no. 2-3, pp. 107-145, 2001.   DOI
11 C. O. Schmidt and T. Kohlmann, "When to use the odds ratio or the relative risk?," International journal of public health, vol. 53, no. 3, pp. 165-167, 2008.   DOI
12 J. Zhang and F. Y. Kai, "What's the relative risk?: A method of correcting the odds ratio in cohort studies of common outcomes," Jama, vol. 280, no. 19, pp. 1690-1691, 1998.   DOI
13 Guo, Jinghuan, et al. "Activity feature solving based on TF-IDF for activity recognition in smart homes." Complexity 2019 (2019).
14 Powers, David Martin. "Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation." (2011).
15 Z. Yun-tao, G. Ling, and W. Yong-cheng, "An improved TF-IDF approach for text classification," Journal of Zhejiang University-Science A, vol. 6, no. 1, pp. 49-55, 2005.   DOI
16 T. Roelleke and J. Wang, "TF-IDF uncovered: a study of theories and probabilities," in Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, 2008, pp. 435-442.
17 Petrik, Juraj, and Daniela Chuda. "Twitter Feeds Profiling With TF-IDF." CLEF. 2019.
18 Kyi Ho Lee, Joon Ho Lee, Kyu Chul Lee., "Improving Retrieval Effectiveness with Multiple Query Combination," JOURNAL OF THE KOREAN SOCIETY FOR LIBRARY AND INFORMATION SCIENCE 31(3), 1997.9, 135-146(12 pages)
19 C. Buckley, A. Singhal, M. Mitra, and G. Salton, "New retrieval approaches using SMART: TREC 4," in Proceedings of the Fourth Text REtrieval Conference (TREC-4), 1995, pp. 25-48.
20 Macdonald, Craig, Nicola Tonellotto, and Iadh Ounis. "Efficient & effective selective query rewriting with efficiency predictions." Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2017.
21 Raza, Muhammad Ahsan, et al. "A Taxonomy and Survey of Semantic Approaches for Query Expansion." IEEE Access 7 (2019): 17823-17833.   DOI
22 J. P. Callan, W. B. Croft, and J. Broglio, "TREC and TIPSTER experiments with INQUERY," Information Processing & Management, vol. 31, no. 3, pp. 327-343, 1995.   DOI
23 F. Asano, S. Ikeda, M. Ogawa, H. Asoh, and N. Kitawaki, "Combined approach of array processing and independent component analysis for blind separation of acoustic signals," IEEE Transactions on Speech and Audio Processing, vol. 11, no. 3, pp. 204-215, 2003.   DOI
24 Nwesri, Abdusalam F. Ahmad, and Hasan AH Alyagoubi. "Applying Arabic Stemming Using Query Expansion." 2015 26th International Workshop on Database and Expert Systems Applications (DEXA). IEEE, 2015.
25 J. Allan, L. Ballesteros, J. P. Callan, W. B. Croft, and Z. Lu, "Recent experiments with INQUERY," in Proceedings of the 4th Text Retrieval Conference, 1995, pp. 49-64.
26 Daou, Hoda. "Detection of Sentiment Provoking Events in Social Media." Proceedings of the 52nd Hawaii International Conference on System Sciences. 2019.
27 Baeza-Yates, Ricardo, et al. "An effective and efficient algorithm for ranking web documents via genetic programming." Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing. ACM, 2019.
28 Fernández, Alejandro Moreo, Andrea Esuli, and Fabrizio Sebastiani. "Learning to Weight for Text Classification." IEEE Transactions on Knowledge and Data Engineering (2018).