DOI QR코드

DOI QR Code

Determining Feature-Size for Text to Numeric Conversion based on BOW and TF-IDF

  • Alyamani, Hasan J. (Department of Information Systems, Faculty of Computing and Information Technology in Rabigh (FCITR), King Abdulaziz University)
  • Received : 2021.12.05
  • Published : 2022.01.30

Abstract

Machine Learning is the most popular method used in data science. Growth of data is not only numeric data but also text data. Most of the algorithm of supervised and unsupervised machine learning algorithms use numeric data. Now it is required to convert text data into numeric. There are many techniques for this conversion. Researcher confuses which technique is best in what situation. Here in proposed work BOW (Bag-of-Words) and TF-IDF (Term-Frequency-Inverse-Document-Frequency) has been studied based on different features to determine best method. After experimental results on text data, TF-IDF and BOW both provide better performance at range from 100 to 150 number of features.

Keywords

References

  1. "3 Reasons Urgent Care Facilities Should Care About Online Reviews," 31 july 2017, 2017. https://resources.reputation.com/reputation-com-blog/3-reasons-urgent-care-facilities-should-care-about-onlinereviews.
  2. B. Liu, Sentiment Analysis and Opinion Mining, vol. 5, no. 1. 2012.
  3. F. e-M. K. Khan, B.B. Baharudin, A. Khan, "Mining opinion from text documents," Adv. Res. Comput. Commun. Eng., vol. 3, no. 7, pp. 217-222.
  4. B. Pang and L. Lee, "Opinion mining and sentiment analysis," Found. Trends Inf. Retr., vol. 2, no. 1-2, pp. 1-135, 2008, doi: 10.1561/1500000011.
  5. W. Y. and L. H. A. Wang S, Li D, Song X, "A feature selection method based on improved fisher's discriminant ratio for text sentiment classification," Expert Syst. Appl., vol. 38, no. 7, pp. 8696-8702., 2011. https://doi.org/10.1016/j.eswa.2011.01.077
  6. L. C. H. and C. H. Chen LS, "A neural network based approach for sentiment classification in the blogosphere.," J. Informetr., vol. 5, no. 2, pp. 313-322, 2011. https://doi.org/10.1016/j.joi.2011.01.003
  7. T. Brychcin, M. Konkol, and J. Steinberger, "Machine Learning Approach to Aspect-Based Sentiment Analysis," in Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), 2014, pp. 817-822.
  8. F. M. Kundi, A. Khan, S. Ahmad, and M. Z. Asghar, "Lexicon-Based Sentiment Analysis in the Social Web," J. Basic. Appl. Sci. Res, vol. 4, no. 6, pp. 238-248, 2014.
  9. S. Muhammad and F. Masud, "MMO: Multiply-Minus-One Rule for Detecting & Ranking Positive and Negative Opinion," Int. J. Adv. Comput. Sci. Appl., vol. 7, no. 5, pp. 122-127, 2016, doi: 10.14569/IJACSA.2016.070519.
  10. M. Z. A. Fazal Masud Kundi, Shakeel Ahmad, Aurangzeb Khan, "Detection and Scoring of Internet Slangs for Sentiment Analysis Using SentiWordNet Fazal," Life Sci. J., vol. 11, no. 1, pp. 66-72, 2014.
  11. J. Fiaidhi, O. Mohammed, S. Mohammed, S. Fong, and T. Kim, "Opinion Mining over Twitterspace : Classifying Tweets Programmatically using the R Approach," Digit. Inf. Manag. (ICDIM), Seventh Int. Conf. on. IEEE, pp. 313-319, 2012.
  12. A. Jeyapriya and C. S. K. Selvi, "Extracting aspects and mining opinions in product reviews using supervised learning algorithm," in 2nd International Conference on Electronics and Communication Systems, ICECS 2015, 2015, pp. 548-552, doi: 10.1109/ECS.2015.7124967.
  13. D. K. Kirange, R. R. Deshmukh, and M. D. K. Kirange, "Aspect Based Sentiment analysis SemEval-2014 Task 4," Asian J. Comput. Sci. {&} Inf. Technol., vol. 4, no. 8, pp. 72-75, Aug. 2014, doi: 10.15520/ajcsit.v4i8.9.
  14. Deepak Kumar Gupta and Asif Ekbal, "Supervised Machine Learning for Aspect based Sentiment Analysis," in Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), 2014, pp. 319-323.
  15. T. Shaikh and D. Deshpande, "A Review on Opinion Mining and Sentiment Analysis," in IJCA Proceedings on National Conference on Recent Trends in Computer Science and Information Technology, 2016, no. 2, pp. 6-9.
  16. A. Alghunaim, M. Mohtarami, S. Cyphers, and J. Glass, "A Vector Space Approach for Aspect Based Sentiment Analysis," Proc. NAACL-HLT 2015, pp. 116-122, 2015.
  17. M. Cuadros, S. Sebastian, G. Rigau, E. H. Unibertsitatea, and S. Sebastian, "V3: Unsupervised Aspect Based Sentiment Analysis for SemEval-2015 Task 12," no. SemEval, pp. 714-718, 2015.
  18. S. Rosenthal, N. Farra, and P. Nakov, "SemEval-2017 Task 4 : Sentiment Analysis in Twitter," Proc. 11th Int. Work. Semant. Eval., vol. 3, no. 4, pp. 502-518, 2017.
  19. M. Gamon, A. Aue, S. Corston-Oliver, and E. Ringger, "Pulse: Mining Customer Opinions from Free Text," in Proceedings of the 6th international conference on Advances in Intelligent Data Analysis, 2005, pp. 121-132, doi: 10.1007/11552253_12.
  20. L. Zhuang, F. Jing, and X.-Y. Zhu, "Movie review mining and summarization," in Proceedings of the 15th ACM international conference on Information and knowledge management - CIKM '06, 2006, pp. 43-50, doi: 10.1145/1183614.1183625.
  21. "https://en.wikipedia.org/wiki/Tf-idf." .
  22. "https://www.kaggle.com." .