Browse > Article
http://dx.doi.org/10.6109/jicce.2019.17.4.239

Comparison of Sentiment Analysis from Large Twitter Datasets by Naïve Bayes and Natural Language Processing Methods  

Back, Bong-Hyun (Department of Computer Engineering, Yeungnam University)
Ha, Il-Kyu (Department of Computer Engineering, Kyungil University)
Abstract
Recently, effort to obtain various information from the vast amount of social network services (SNS) big data generated in daily life has expanded. SNS big data comprise sentences classified as unstructured data, which complicates data processing. As the amount of processing increases, a rapid processing technique is required to extract valuable information from SNS big data. We herein propose a system that can extract human sentiment information from vast amounts of SNS unstructured big data using the naïve Bayes algorithm and natural language processing (NLP). Furthermore, we analyze the effectiveness of the proposed method through various experiments. Based on sentiment accuracy analysis, experimental results showed that the machine learning method using the naïve Bayes algorithm afforded a 63.5% accuracy, which was lower than that yielded by the NLP method. However, based on data processing speed analysis, the machine learning method by the naïve Bayes algorithm demonstrated a processing performance that was approximately 5.4 times higher than that by the NLP method.
Keywords
Big data processing; Machine learning; Naive Bayes algorithm; Sentiment analysis; SNS big data;
Citations & Related Records
연도 인용수 순위
  • Reference
1 W. Xiaofei, Z. Yuhua, L. Victor, G. Nadra, and J. Tianpeng, "D2D big data: content deliveries over wireless device-to-device sharing in large-scale mobile networks," IEEE Wireless Communications, vol. 25, no. 1, pp. 32-38, 2018. DOI: 10.1109/MWC.2018.1700215.   DOI
2 Z. Zhenhua, H. Qing, G. Jing, and N. Ming, "A deep learning approach for detecting traffic accidents from social media data," Transportation Research Part C: Emerging Technologies, vol. 86, pp. 580-596, 2017. DOI: 10.1016/j.trc.2017.11.027.
3 X. Wu, X. Zhu, G. Wu, and W. Ding, "Data mining with big data," IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97-107, 2014. DOI: 10.1109/TKDE.2013.109.   DOI
4 C. Cheng and C. Zhang, "Data-intensive applications, challenges, techniques and technologies: A survey on Big Data," Information Sciences, vol. 275, pp. 314-347, 2014. DOI: 10.1016/j.ins.2014.01.015.   DOI
5 K. Riesen and H. Bunke, "IAM graph database repository for graphbased pattern recognition and machine learning," Lecture Notes in Computer Science, vol. 5342, pp. 287-297, 2008. DOI: 10.1007/978-3-540-89689-0_33.
6 I. Ha, B. Bak, and B. Ahn, "MapReduce functions to analyze sentiment information from social big data," International Journal of Distributed Sensor Networks, vol. 11, no. 6, pp. 1-11, 2015. DOI: 10.1155/2015/417502.
7 G. Gerard, H. Martine, and P. Alex, "Big data and management," Academy of Management Journal, vol. 57, no. 2, pp. 321-326, 2014. DOI: 10.5465/amj.2014.4002.   DOI
8 L. Gu, D. Zeng, P. Li, and S. Guo, "Cost minimization for big data processing in geo-distributed data centers," IEEE Transactions on Emerging Topics in Computing, vol. 2, no. 3, pp. 314-323, 2013. DOI: 10.1109/TETC.2014.2310456.   DOI
9 C. Ji, Y. Ki, W. Qiu, U. Awada, and K. Li, "Big data processing in cloud computing environments," in Proceeding of 2012 International Symposium on Pervasive Systems, Algorithms and Networks, San Marcos, TX, USA, pp.17-23, 2012. DOI: 10.1109/I-SPAN.2012.9.
10 T. Zhu, S. Xiao, Q. Zhang, Y. Gu, P. Yi, and Y. Li, "Emergent technologies in big data sensing: a survey," International Journal of Distributed Sensor Networks, vol. 11, no. 10, pp. 1-3, 2015. DOI: 10.1155/2015/902982.
11 L. Carson, J. Fan, P. Tik, and C. Paul, "Big data analytics of social network data: who cares most about you on Facebook?," Big Data, vol. 27, pp.1-15, 2017. DOI: 10.1007/978-3-319-60255-4_1.
12 B. Desamparados and D. Josep, "Big data sources and methods for social and economic analyses," Technological Forecasting and Social Change, vol. 130, pp.99-113, 2018. DOI: 10.1016/j.techfore.2017.07.027.   DOI
13 H. Amir and C. Erik, "Semi-supervised learning for big social data analysis," Neurocomputing, vol. 275, no. 31, pp. 1662-1673, 2017. DOI: 10.1016/j.neucom.2017.10.010.
14 J. Qiu, Q. Wu, G. Ding, Y. Xu, and S. Feng, "A survey of machine learning for big data processing," EURASIP Journal on Advances in Signal Processing, vol. 2016, pp. 1-16, 2016. DOI: 10.1186/s13634-016-0355-x.   DOI
15 S. Suthanharan, "Big data classification: problems and challenges in network intrusion prediction with machine learning," Performance Evaluation Review, vol. 41, no, 4, pp. 70-73, 2014. DOI: 10.1145/2627534.2627557.   DOI
16 O. Jarrah, P. Yoo, S. Muhaidat, G. Karagiannidis, and K. Taha, "Efficient machine learning for big data: A review," Big Data Research, vol. 2, no. 3, pp. 87-93, 2015. DOI: 10.1016/j.bdr.2015.04.001.   DOI
17 B. Philip, T. Yong, E. Edward, R. William, N. Alex, N. Carl, C. Michael, P. Clinton, and C. Bridget, "Big data in cryoEM: automated collection, processing and accessibility of EM data," Current Opinion in Microbiology. vol. 43, pp. 1-8, 2018. DOI: 10.1016/j.mib.2017.10.005.   DOI
18 D. Garlasu, V. Sandulescu, I. Halcu, G. Neculoiu, and V. Marinescu, "A big data implementation based on grid computing," in Proceeding of the 2013 11th RoEduNet International Conference, Sinaia, Romania, pp. 1-4, 2013. DOI: 10.1109/RoEduNet.2013.6511732.
19 W. Tan, M. Blake, I. Saleh, and S. Dustdar, "Social-network-sourced big data analytics," IEEE Internet Computing, vol. 17, no. 5, pp. 62-69, 2013. DOI: 10.1109/MIC.2013.100.   DOI
20 W. Lizhe, M. Yan, Y. Jining, C. Victor, and Z. Albert, "pipsCloud: high performance cloud computing for remote sensing big data management and processing," Future Generation Computer Systems, vol. 78, no. 1, pp. 353-368, 2016. DOI: 10.1016/j.future.2016.06.009.   DOI
21 M. Gunasekaran, V. Vijayakumar, R. Varatharajan, K. Priyan S. Revathi, and H. Ching-Hsien, "Machine learning based big data processing framework for cancer diagnosis using hidden Markov model and GM clustering," Wireless Personal Communications, vol. 102, no. 3, pp. 2099-2116, 2018. DOI: 10.1007/s11277-017-5044-z.   DOI
22 S. Landset, T. Khoshgoftaar, A. Richter, and T. Hasanin, "A survey of open source tools for machine learning with big data in the Hadoop ecosystem," Journal of Big Data, vol. 2, no. 24, pp. 1-36, 2015. DOI: 10.1186/s40537-015-0032-1.
23 E. Xing, Q. Ho, W. Dai, J. Kim, and Y. Yu, "Petuum: a new platform for distributed machine learning on big data," IEEE Transactions on Big Data, vol. 1, no. 2, pp. 49-67, 2015. DOI: 10.1109/TBDATA.2015.2472014.   DOI
24 M. Chen, Y. Hao, K. Hwang, L. Wang, and L. Wang, "Disease prediction by machine learning over big data from healthcare communities," IEEE Access, vol. 5, pp. 8869-8879, 2017. DOI: 10.1109/ACCESS.2017.2694446.   DOI
25 K. Avita, W. Mohammad, and R. Goudar, "Big data: issues, challenges, tools and good practices," in Proceeding of the 2013 Sixth International Conference on Contemporary Computing (IC3), Noida, India, pp. 404-409, 2013. DOI: 10.1109/IC3.2013.6612229.
26 O. Ahmed, B. Fatima, L. Ayoub, and B. Samir, "Big data technologies: A survey," Journal of King Saud University - Computer and Information Sciences, vol. 30, no. 4, pp. 431-448, 2017. DOI: 10.1016/j.jksuci.2017.06.001.   DOI
27 Z. Qingchen, Y. Laurence, C. Zhikui, L. Peng, "A survey on deep learning for big data," Information Fusion, vol. 42, pp. 146-157, 2017. DOI: 10.1016/j.inffus.2017.10.006.   DOI
28 B. Gema, J. Jason, and C. David, "Social big data: Recent achievements and new challenges," Information Fusion, vol. 28, pp. 45-59, 2016. DOI: 10.1016/j.inffus.2015.08.005.   DOI