Browse > Article
http://dx.doi.org/10.22937/IJCSNS.2022.22.1.39

Determining Feature-Size for Text to Numeric Conversion based on BOW and TF-IDF  

Alyamani, Hasan J. (Department of Information Systems, Faculty of Computing and Information Technology in Rabigh (FCITR), King Abdulaziz University)
Publication Information
International Journal of Computer Science & Network Security / v.22, no.1, 2022 , pp. 283-287 More about this Journal
Abstract
Machine Learning is the most popular method used in data science. Growth of data is not only numeric data but also text data. Most of the algorithm of supervised and unsupervised machine learning algorithms use numeric data. Now it is required to convert text data into numeric. There are many techniques for this conversion. Researcher confuses which technique is best in what situation. Here in proposed work BOW (Bag-of-Words) and TF-IDF (Term-Frequency-Inverse-Document-Frequency) has been studied based on different features to determine best method. After experimental results on text data, TF-IDF and BOW both provide better performance at range from 100 to 150 number of features.
Keywords
Machine Learning; Supervised and Un-Supervised Learning; TF-IDF; BOW;
Citations & Related Records
연도 인용수 순위
  • Reference
1 L. Zhuang, F. Jing, and X.-Y. Zhu, "Movie review mining and summarization," in Proceedings of the 15th ACM international conference on Information and knowledge management - CIKM '06, 2006, pp. 43-50, doi: 10.1145/1183614.1183625.   DOI
2 "https://en.wikipedia.org/wiki/Tf-idf." .
3 A. Jeyapriya and C. S. K. Selvi, "Extracting aspects and mining opinions in product reviews using supervised learning algorithm," in 2nd International Conference on Electronics and Communication Systems, ICECS 2015, 2015, pp. 548-552, doi: 10.1109/ECS.2015.7124967.   DOI
4 Deepak Kumar Gupta and Asif Ekbal, "Supervised Machine Learning for Aspect based Sentiment Analysis," in Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), 2014, pp. 319-323.
5 T. Shaikh and D. Deshpande, "A Review on Opinion Mining and Sentiment Analysis," in IJCA Proceedings on National Conference on Recent Trends in Computer Science and Information Technology, 2016, no. 2, pp. 6-9.
6 M. Cuadros, S. Sebastian, G. Rigau, E. H. Unibertsitatea, and S. Sebastian, "V3: Unsupervised Aspect Based Sentiment Analysis for SemEval-2015 Task 12," no. SemEval, pp. 714-718, 2015.
7 S. Rosenthal, N. Farra, and P. Nakov, "SemEval-2017 Task 4 : Sentiment Analysis in Twitter," Proc. 11th Int. Work. Semant. Eval., vol. 3, no. 4, pp. 502-518, 2017.
8 F. M. Kundi, A. Khan, S. Ahmad, and M. Z. Asghar, "Lexicon-Based Sentiment Analysis in the Social Web," J. Basic. Appl. Sci. Res, vol. 4, no. 6, pp. 238-248, 2014.
9 "https://www.kaggle.com." .
10 M. Gamon, A. Aue, S. Corston-Oliver, and E. Ringger, "Pulse: Mining Customer Opinions from Free Text," in Proceedings of the 6th international conference on Advances in Intelligent Data Analysis, 2005, pp. 121-132, doi: 10.1007/11552253_12.   DOI
11 "3 Reasons Urgent Care Facilities Should Care About Online Reviews," 31 july 2017, 2017. https://resources.reputation.com/reputation-com-blog/3-reasons-urgent-care-facilities-should-care-about-onlinereviews.
12 S. Muhammad and F. Masud, "MMO: Multiply-Minus-One Rule for Detecting & Ranking Positive and Negative Opinion," Int. J. Adv. Comput. Sci. Appl., vol. 7, no. 5, pp. 122-127, 2016, doi: 10.14569/IJACSA.2016.070519.   DOI
13 B. Liu, Sentiment Analysis and Opinion Mining, vol. 5, no. 1. 2012.
14 F. e-M. K. Khan, B.B. Baharudin, A. Khan, "Mining opinion from text documents," Adv. Res. Comput. Commun. Eng., vol. 3, no. 7, pp. 217-222.
15 B. Pang and L. Lee, "Opinion mining and sentiment analysis," Found. Trends Inf. Retr., vol. 2, no. 1-2, pp. 1-135, 2008, doi: 10.1561/1500000011.   DOI
16 L. C. H. and C. H. Chen LS, "A neural network based approach for sentiment classification in the blogosphere.," J. Informetr., vol. 5, no. 2, pp. 313-322, 2011.   DOI
17 T. Brychcin, M. Konkol, and J. Steinberger, "Machine Learning Approach to Aspect-Based Sentiment Analysis," in Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), 2014, pp. 817-822.
18 J. Fiaidhi, O. Mohammed, S. Mohammed, S. Fong, and T. Kim, "Opinion Mining over Twitterspace : Classifying Tweets Programmatically using the R Approach," Digit. Inf. Manag. (ICDIM), Seventh Int. Conf. on. IEEE, pp. 313-319, 2012.
19 M. Z. A. Fazal Masud Kundi, Shakeel Ahmad, Aurangzeb Khan, "Detection and Scoring of Internet Slangs for Sentiment Analysis Using SentiWordNet Fazal," Life Sci. J., vol. 11, no. 1, pp. 66-72, 2014.
20 W. Y. and L. H. A. Wang S, Li D, Song X, "A feature selection method based on improved fisher's discriminant ratio for text sentiment classification," Expert Syst. Appl., vol. 38, no. 7, pp. 8696-8702., 2011.   DOI
21 D. K. Kirange, R. R. Deshmukh, and M. D. K. Kirange, "Aspect Based Sentiment analysis SemEval-2014 Task 4," Asian J. Comput. Sci. {&} Inf. Technol., vol. 4, no. 8, pp. 72-75, Aug. 2014, doi: 10.15520/ajcsit.v4i8.9.   DOI
22 A. Alghunaim, M. Mohtarami, S. Cyphers, and J. Glass, "A Vector Space Approach for Aspect Based Sentiment Analysis," Proc. NAACL-HLT 2015, pp. 116-122, 2015.