DOI QR코드

DOI QR Code

An Improved Text Classification Method for Sentiment Classification

  • Wang, Guangxing (Department of Information Technology Center, Jiujiang University) ;
  • Shin, Seong Yoon (School of Computer Information & Communication Engineering, Kunsan National University)
  • Received : 2019.01.06
  • Accepted : 2019.03.13
  • Published : 2019.03.31

Abstract

In recent years, sentiment analysis research has become popular. The research results of sentiment analysis have achieved remarkable results in practical applications, such as in Amazon's book recommendation system and the North American movie box office evaluation system. Analyzing big data based on user preferences and evaluations and recommending hot-selling books and hot-rated movies to users in a targeted manner greatly improve book sales and attendance rate in movies [1, 2]. However, traditional machine learning-based sentiment analysis methods such as the Classification and Regression Tree (CART), Support Vector Machine (SVM), and k-nearest neighbor classification (kNN) had performed poorly in accuracy. In this paper, an improved kNN classification method is proposed. Through the improved method and normalizing of data, the purpose of improving accuracy is achieved. Subsequently, the three classification algorithms and the improved algorithm were compared based on experimental data. Experiments show that the improved method performs best in the kNN classification method, with an accuracy rate of 11.5% and a precision rate of 20.3%.

Keywords

E1ICAW_2019_v17n1_41_f0001.png 이미지

Fig. 1. Comparison of classification accuracy before and after kNN method improvement. K represents the value before the improvement, and ImpK represents the value of K of the improved KNN method.

Table 2. Improved kNN algorithm flow pseudo code description

E1ICAW_2019_v17n1_41_t0001.png 이미지

Table 3. The classification of test sample

E1ICAW_2019_v17n1_41_t0002.png 이미지

Table 4. Comparison of the three models of classification prediction

E1ICAW_2019_v17n1_41_t0003.png 이미지

Table 5. Comparison of classification prediction after improvement

E1ICAW_2019_v17n1_41_t0004.png 이미지

Table 1. kNN algorithm process pseudo code description

E1ICAW_2019_v17n1_41_t0005.png 이미지

References

  1. B. Smith and G. Linden, "wo decades of recommender systems at amazon.com," IEEE Internet Computing, vol. 21, no. 3, pp.12-18, 2017. DOI:10.1109/MIC.2017.72.
  2. S. Halder, Md. Samiullah, A. M. Jehad Sarkar, and Y.-K. Lee, "Movie swarm: Information mining technique for movie recommendation system," in Proceeding of 2012 7th International Conference on Electrical and Computer Engineering, pp. 462-465, 2013. DOI: 10.1109/ICECE.2012.6471587.
  3. P. Chen and X. Fu, "Research on sentiment classification of tests based on SVM," Journal of Guangdong University of Technology, vol. 31, no. 3, pp. 95-101, 2014. DOI:10.3969/j.issn.1007-7162.2014.03.017.
  4. S. Tan, Y. Li, H. Sun, Z. Guan, amd X. Yan, "Interpreting the Public Sentiment Variations on Twitter," IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 5, pp. 1158-1170, 2014. DOI: 10.1109/TKDE.2013.116.
  5. N. Arunachalam, S. J. Sneka, and G. MadhuMathi, "A Survey on text classification techniques for sentiment polarity detection," in Proceeding of 2017 Innovations in Power and Advanced Computing Technologies (i-PACT), pp. 1-5, 2017. DOI: 10.1109/IPACT.2017.8245127.
  6. J. M. Desai and S. R. Andhariya, "Sentiment analysis approach to adapt a shallow parsing based sentiment lexicon," in Proceeding of 2015 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), pp. 1-4, 2015. DOI: 10.1109/ICIIECS.2015.7193160.
  7. Q. Li, S. Shah, R. Fang, A. Nourbakhsh, and X. Liu, "Tweet sentiment analysis by incorporating sentiment-specific word embedding and weighted text features," in Proceeding of 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI), pp. 568-571, 2016. DOI: 10.1109/WI.2016.0097.
  8. THUCNews DataSet, [Online] Available: http://thuctc.thunlp.org/.
  9. C. Yu, "Adaptive japanese teaching optimization based on classification and regression tree," in Proceeding of 2017 International Conference on Robots & Intelligent System (ICRIS), pp.15-18, 2017. DOI: 10.1109/ICRIS.2017.12.
  10. R. Li, X. Zhao, X. Yu, J. Li, N. Cheng, and J. Zhang, "Incident duration model on urban freeways using three different algorithms of decision tree," in Proceeding of 2010 International Conference on Intelligent Computation Technology and Automation, pp..526-528, 2010. DOI: 10.1109/ICICTA.2010.602.
  11. R. Izmailov, V. Vapnik, and A. Vashist, "Multidimensional splines with infinite number of knots as SVM kernels," in Proceeding of the 2013 International Joint Conference on Neural Networks (IJCNN), pp.1-7, 2013. DOI: 10.1109/IJCNN.2013.6706860.
  12. L. Zhou, L. Wang, X. Ge, and Q. Shi, "A clustering-Based KNN improved algorithm CLKNN for text classification," in Proceeding of 2010 2nd International Asia Conference on Informatics in Control, Automation and Robotics (CAR 2010), pp. 212-215, 2010. DOI: 10.1109/CAR.2010.5456668.
  13. H. Yigit, "A weighting approach for KNN classifier," in Proceeding of 2013 International Conference on Electronics, Computer and - Computation (ICECCO), pp. 228-231, 2013. DOI: 10.1109/ICECCO.2013.6718270.
  14. B. Pang, L. Lee, and S. Vaithyanathan, "Thumbs up? Sentiment Classification using machine learning techniques," in Proceeding of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, pp. 79-86, 2002,.
  15. P. D. Turney and M. L. Littman, "Measuring praiseand critism inference of semantic orientation from as sociaton," ACM Transon Information Systems, vol. 21, no. 4, pp. 315-346, 2003. https://doi.org/10.1145/944012.944013
  16. S. Taneja, C. Gupta, S. Aggarwal, and V. Jindal, "MFZ-KNN-A modified fuzzy based K nearest neighbor algorithm," in Proceeding of 2015 International Conference on Cognitive Computing and Information Processing (CCIP), pp. 1-5, 2015. DOI: 10.1109/CCIP.2015.7100689.
  17. J. Huang, Y. Wei, J. Yi, and M. Liu, "An improved kNN based on class contribution and feature weighting," in Proceeding of 2018 10th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), pp. 313-316, 2018. DOI: 10.1109/ICMTMA.2018.00083.

Cited by

  1. Deep Learning Document Analysis System Based on Keyword Frequency and Section Centrality Analysis vol.19, pp.1, 2019, https://doi.org/10.6109/jicce.2021.19.1.48
  2. Improving productivity in Hollywood with data science: Using emotional arcs of movies to drive product and service innovation in entertainment industries vol.72, pp.5, 2019, https://doi.org/10.1080/01605682.2019.1705194