DOI QR코드

DOI QR Code

Comparison of Sentiment Analysis from Large Twitter Datasets by Naïve Bayes and Natural Language Processing Methods

  • Back, Bong-Hyun (Department of Computer Engineering, Yeungnam University) ;
  • Ha, Il-Kyu (Department of Computer Engineering, Kyungil University)
  • Received : 2019.08.20
  • Accepted : 2019.11.06
  • Published : 2019.12.31

Abstract

Recently, effort to obtain various information from the vast amount of social network services (SNS) big data generated in daily life has expanded. SNS big data comprise sentences classified as unstructured data, which complicates data processing. As the amount of processing increases, a rapid processing technique is required to extract valuable information from SNS big data. We herein propose a system that can extract human sentiment information from vast amounts of SNS unstructured big data using the naïve Bayes algorithm and natural language processing (NLP). Furthermore, we analyze the effectiveness of the proposed method through various experiments. Based on sentiment accuracy analysis, experimental results showed that the machine learning method using the naïve Bayes algorithm afforded a 63.5% accuracy, which was lower than that yielded by the NLP method. However, based on data processing speed analysis, the machine learning method by the naïve Bayes algorithm demonstrated a processing performance that was approximately 5.4 times higher than that by the NLP method.

Keywords

References

  1. X. Wu, X. Zhu, G. Wu, and W. Ding, "Data mining with big data," IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97-107, 2014. DOI: 10.1109/TKDE.2013.109.
  2. C. Cheng and C. Zhang, "Data-intensive applications, challenges, techniques and technologies: A survey on Big Data," Information Sciences, vol. 275, pp. 314-347, 2014. DOI: 10.1016/j.ins.2014.01.015.
  3. K. Riesen and H. Bunke, "IAM graph database repository for graphbased pattern recognition and machine learning," Lecture Notes in Computer Science, vol. 5342, pp. 287-297, 2008. DOI: 10.1007/978-3-540-89689-0_33.
  4. I. Ha, B. Bak, and B. Ahn, "MapReduce functions to analyze sentiment information from social big data," International Journal of Distributed Sensor Networks, vol. 11, no. 6, pp. 1-11, 2015. DOI: 10.1155/2015/417502.
  5. L. Gu, D. Zeng, P. Li, and S. Guo, "Cost minimization for big data processing in geo-distributed data centers," IEEE Transactions on Emerging Topics in Computing, vol. 2, no. 3, pp. 314-323, 2013. DOI: 10.1109/TETC.2014.2310456.
  6. C. Ji, Y. Ki, W. Qiu, U. Awada, and K. Li, "Big data processing in cloud computing environments," in Proceeding of 2012 International Symposium on Pervasive Systems, Algorithms and Networks, San Marcos, TX, USA, pp.17-23, 2012. DOI: 10.1109/I-SPAN.2012.9.
  7. T. Zhu, S. Xiao, Q. Zhang, Y. Gu, P. Yi, and Y. Li, "Emergent technologies in big data sensing: a survey," International Journal of Distributed Sensor Networks, vol. 11, no. 10, pp. 1-3, 2015. DOI: 10.1155/2015/902982.
  8. D. Garlasu, V. Sandulescu, I. Halcu, G. Neculoiu, and V. Marinescu, "A big data implementation based on grid computing," in Proceeding of the 2013 11th RoEduNet International Conference, Sinaia, Romania, pp. 1-4, 2013. DOI: 10.1109/RoEduNet.2013.6511732.
  9. W. Tan, M. Blake, I. Saleh, and S. Dustdar, "Social-network-sourced big data analytics," IEEE Internet Computing, vol. 17, no. 5, pp. 62-69, 2013. DOI: 10.1109/MIC.2013.100.
  10. W. Lizhe, M. Yan, Y. Jining, C. Victor, and Z. Albert, "pipsCloud: high performance cloud computing for remote sensing big data management and processing," Future Generation Computer Systems, vol. 78, no. 1, pp. 353-368, 2016. DOI: 10.1016/j.future.2016.06.009.
  11. B. Philip, T. Yong, E. Edward, R. William, N. Alex, N. Carl, C. Michael, P. Clinton, and C. Bridget, "Big data in cryoEM: automated collection, processing and accessibility of EM data," Current Opinion in Microbiology. vol. 43, pp. 1-8, 2018. DOI: 10.1016/j.mib.2017.10.005.
  12. O. Ahmed, B. Fatima, L. Ayoub, and B. Samir, "Big data technologies: A survey," Journal of King Saud University - Computer and Information Sciences, vol. 30, no. 4, pp. 431-448, 2017. DOI: 10.1016/j.jksuci.2017.06.001.
  13. Z. Qingchen, Y. Laurence, C. Zhikui, L. Peng, "A survey on deep learning for big data," Information Fusion, vol. 42, pp. 146-157, 2017. DOI: 10.1016/j.inffus.2017.10.006.
  14. B. Gema, J. Jason, and C. David, "Social big data: Recent achievements and new challenges," Information Fusion, vol. 28, pp. 45-59, 2016. DOI: 10.1016/j.inffus.2015.08.005.
  15. K. Avita, W. Mohammad, and R. Goudar, "Big data: issues, challenges, tools and good practices," in Proceeding of the 2013 Sixth International Conference on Contemporary Computing (IC3), Noida, India, pp. 404-409, 2013. DOI: 10.1109/IC3.2013.6612229.
  16. G. Gerard, H. Martine, and P. Alex, "Big data and management," Academy of Management Journal, vol. 57, no. 2, pp. 321-326, 2014. DOI: 10.5465/amj.2014.4002.
  17. L. Carson, J. Fan, P. Tik, and C. Paul, "Big data analytics of social network data: who cares most about you on Facebook?," Big Data, vol. 27, pp.1-15, 2017. DOI: 10.1007/978-3-319-60255-4_1.
  18. B. Desamparados and D. Josep, "Big data sources and methods for social and economic analyses," Technological Forecasting and Social Change, vol. 130, pp.99-113, 2018. DOI: 10.1016/j.techfore.2017.07.027.
  19. H. Amir and C. Erik, "Semi-supervised learning for big social data analysis," Neurocomputing, vol. 275, no. 31, pp. 1662-1673, 2017. DOI: 10.1016/j.neucom.2017.10.010.
  20. J. Qiu, Q. Wu, G. Ding, Y. Xu, and S. Feng, "A survey of machine learning for big data processing," EURASIP Journal on Advances in Signal Processing, vol. 2016, pp. 1-16, 2016. DOI: 10.1186/s13634-016-0355-x.
  21. S. Suthanharan, "Big data classification: problems and challenges in network intrusion prediction with machine learning," Performance Evaluation Review, vol. 41, no, 4, pp. 70-73, 2014. DOI: 10.1145/2627534.2627557.
  22. O. Jarrah, P. Yoo, S. Muhaidat, G. Karagiannidis, and K. Taha, "Efficient machine learning for big data: A review," Big Data Research, vol. 2, no. 3, pp. 87-93, 2015. DOI: 10.1016/j.bdr.2015.04.001.
  23. S. Landset, T. Khoshgoftaar, A. Richter, and T. Hasanin, "A survey of open source tools for machine learning with big data in the Hadoop ecosystem," Journal of Big Data, vol. 2, no. 24, pp. 1-36, 2015. DOI: 10.1186/s40537-015-0032-1.
  24. E. Xing, Q. Ho, W. Dai, J. Kim, and Y. Yu, "Petuum: a new platform for distributed machine learning on big data," IEEE Transactions on Big Data, vol. 1, no. 2, pp. 49-67, 2015. DOI: 10.1109/TBDATA.2015.2472014.
  25. M. Chen, Y. Hao, K. Hwang, L. Wang, and L. Wang, "Disease prediction by machine learning over big data from healthcare communities," IEEE Access, vol. 5, pp. 8869-8879, 2017. DOI: 10.1109/ACCESS.2017.2694446.
  26. M. Gunasekaran, V. Vijayakumar, R. Varatharajan, K. Priyan S. Revathi, and H. Ching-Hsien, "Machine learning based big data processing framework for cancer diagnosis using hidden Markov model and GM clustering," Wireless Personal Communications, vol. 102, no. 3, pp. 2099-2116, 2018. DOI: 10.1007/s11277-017-5044-z.
  27. W. Xiaofei, Z. Yuhua, L. Victor, G. Nadra, and J. Tianpeng, "D2D big data: content deliveries over wireless device-to-device sharing in large-scale mobile networks," IEEE Wireless Communications, vol. 25, no. 1, pp. 32-38, 2018. DOI: 10.1109/MWC.2018.1700215.
  28. Z. Zhenhua, H. Qing, G. Jing, and N. Ming, "A deep learning approach for detecting traffic accidents from social media data," Transportation Research Part C: Emerging Technologies, vol. 86, pp. 580-596, 2017. DOI: 10.1016/j.trc.2017.11.027.

Cited by

  1. 온라인 고객 리뷰를 활용한 제품 효과 분석 기법 vol.9, pp.9, 2019, https://doi.org/10.3745/ktsde.2020.9.9.259