DOI QR코드

DOI QR Code

속성선택방법을 이용한 전기자동차 소셜미디어 데이터의 감성분석 연구

Exploring the Sentiment Analysis of Electric Vehicles Social Media Data by Using Feature Selection Methods

  • Costello, Francis Joseph (SKK Business School, Sungkyunkwan University) ;
  • Lee, Kun Chang (Global Business Administration/Dept of Health Sciences & Technology, SAIHST (Samsung Advanced Institute for Health Sciences & Technology) Sungkyunkwan University)
  • 투고 : 2019.01.02
  • 심사 : 2020.02.20
  • 발행 : 2020.02.28

초록

본 연구는 전기자동차(EV)에 대한 소셜미디어 데이터를 기반으로 감성분석 (SA)과 속성선택 (FS)방법을 적용하여 전기자동차에 대한 일반 사람들의 의견을 보다 효과적이고 정확히 예측할 수 있는 새로운 방법론을 제안한다. 구체적인 방법은 다음과 같다. 첫째, 유튜브에 있는 전기자동차에 대한 일반 사람들의 의견을 추출하였다. 둘째, 분석의 효과성을 증대하기 위하여 카이 스퀘어, 정보획득량, 릴리프에프 등 세가지 속성선택 방법을 적용하였다. 그 결과 로지스틱 회귀분석 및 서포트 벡터 머신 분류 기법에서 가장 의미있는 결과를 얻을 수 있다는 것이 확인되었다.

This study presents a recently obtained social media data set based upon the case study of Electric Vehicles (EV) and looks to implement a sentiment analysis (SA) in order to gain insights. This study uses two methods in order to fully analyze the public's sentiment on EVs. First, we implement a SA tool in which we used to extract the sentiment of comments. Next we labeled the data with these sentiments obtained and classified them. While performing classification we found the problem of dimensionality and also explored the use of feature selection (FS) models in order to reduce the data set's dimensionality. We found that the use of three FS models (Chi Squared, Information Gain and ReliefF) showed the most promising results when used alongside a logistic and support vector machines classification algorithm. the contributions of this paper are in providing an real-world example of social media text analytics which can be adopted in many other areas of research and business. Moving forward researchers can use the methodological approach in this paper to further refine and improve their own case uses in text analytics.

키워드

참고문헌

  1. X. Tian, Y. Geng, S. Zhong, J. Wilson, C. Gao, W. Chen & H. Hao. (2018). A bibliometric analysis on trends and characters of carbon emissions from transport sector. Transportation Research Part D: Transport and Environment, 59(December 2017) 1-10. https://doi.org/10.1016/j.trd.2017.12.009
  2. W. He, X. Tian, R. Tao, W. Zhang, G. Yan & V. Akula. (2017). Application of social media analytics: A case of analyzing online hotel reviews. Online Information Review, 41(7), 921-935. https://doi.org/10.1108/OIR-07-2016-0201
  3. T. Carpenter (2015). Measuring and Mitigating Electric Vehicle Adoption Barriers. PhD thesis, Waterloo, Ontario.
  4. J. Kim, M. Han, Y. Lee & Y. Park. (2016). Futuristic data-driven scenario building: Incorporating text mining and fuzzy association rule mining into fuzzy cognitive map. Expert Systems with Applications, 57, 311-323. https://doi.org/10.1016/j.eswa.2016.03.043
  5. J. Li & H. Liu. (2017). Challenges of Feature Selection for Big Data Analytics. IEEE Computer Society, (March), 9-15. https://doi.org/10.1109/MIS.2017.38
  6. M. N. Injadat, F. Salo & A. B. Nassif. (2016). Data mining techniques in social media: A survey. Neurocomputing, 214, 654-670. https://doi.org/10.1016/j.neucom.2016.06.045
  7. B. Li, K. C. C. Chan, C. Ou & S. Ruifeng. (2017). Discovering public sentiment in social media for predicting stock movement of publicly listed companies. Information Systems, 69, 81-92. https://doi.org/10.1016/j.is.2016.10.001
  8. N. F. F. da Silva, E. R. Hruschka & E. R. Hruschka. (2014). Tweet sentiment analysis with classifier ensembles. Decision Support Systems, 66, 170-179. https://doi.org/10.1016/j.dss.2014.07.003
  9. H. Yuan, R. Y. K. Lau & W. Xu. (2016). The determinants of crowdfunding success: A semantic text analytics approach. Decision Support Systems, 91. https://doi.org/10.1016/j.dss.2016.08.001
  10. C. Dhaoui, C. M. Webster & L. P. Tan. (2017). Social media sentiment analysis: lexicon versus machine learning. Journal of Consumer Marketing, 34(6), 480-488. https://doi.org/10.1108/JCM-03-2017-2141
  11. A. Ortigosa, J. M. Martín & R. M. Carrol. (2014). Sentiment analysis in Facebook and its application to e-learning. Computers in Human Behavior, 31(1), 527-541. https://doi.org/10.1016/j.chb.2013.05.024
  12. T. W. Rinker. (2018). sentimentr: Calculate Text Polarity Sentiment version 2.6.1. Retrieved from. http://github.com/trinker/sentimentr
  13. C. T. Tran, M. Zhang, P. Andreae, B. Xue & L. T. Bui. (2018). Improving performance of classification on incomplete data using feature selection and clustering. Applied Soft Computing Journal, 73, 848-861. https://doi.org/10.1016/j.asoc.2018.09.026
  14. M. Tutkan, M. C. Ganiz & S. Akyokus. (2016). Helmholtz principle based supervised and unsupervised feature selection methods for text mining. Information Processing and Management, 52(5), 885-910. https://doi.org/10.1016/j.ipm.2016.03.007
  15. K. Seddig, P. Jochem & W. Fichtner. (2017). Integrating renewable energy sources by electric vehicle fleets under uncertainty. Energy, 141, 2145-2153. https://doi.org/10.1016/j.energy.2017.11.140
  16. M. Neaimeh, S. D. Salisbury, G. A. Hill, P. T. Blythe, D. R. Scoffield & J. E. Francfort. (2017). Analysing the usage and evidencing the importance of fast chargers for the adoption of battery electric vehicles. Energy Policy, 108, 474-486. https://doi.org/10.1016/j.enpol.2017.06.033
  17. D. Connolly. (2017). Economic viability of electric roads compared to oil and batteries for all forms of road transport. EnergyStrategy Reviews. https://doi.org/10.1016/j.esr.2017.09.005
  18. L. H. Bjornsson & S. Karlsson. (2017). Electrification of the two-car household: PHEV or BEV? Transportation Research Part C: Emerging Technologies, 85(October), 363-376. https://doi.org/10.1016/j.trc.2017.09.021
  19. I. H. Witten, E. Frank & M. A. Hall. (2011). Data Mining: Practical Machine Learning Tools and Techniques (3rd ed.). Burlington, MA: Morgan Kaufmann Publishers Inc. https://doi.org/10.1016/B978-0-12-374856-0.00001-8
  20. M. Robnik-Sikonja & I. Kononenko. (2003). Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning, 53(1), 23-69. https://doi.org/10.1023/A:1025667309714
  21. M. A. Hall. (1999). Correlation-based feature selection for machine learning.
  22. R. J. Quinlan. (1986). Induction of decision trees. Machine Learning, 1(1), 81-106. https://doi.org/10.1007/BF00116251
  23. G. Wang, J. Sun, J. Ma, K. Xu & J. Gu (2014). Sentiment classification: The contribution of ensemble learning. DecisionSupport Systems, 57, 77-93. https://doi.org/10.1016/j.dss.2013.08.002
  24. R. Togo, K. Magota, T. Shiga, K. Hirata, I. Tsujino, M. Haseyama & T. Ogawa (2018). Cardiac sarcoidosis classification with deep convolutional neural network-based features using polar maps. Computers in Biology and Medicine, 104(August 2018), 81-86. https://doi.org/10.1016/j.compbiomed.2018.11.008
  25. A. Onan & S. Korukoglu (2017). A feature selection model based on genetic rank aggregation for text sentiment classification. Journal of Information Science, 43(1), 25-38. https://doi.org/10.1177/0165551515613226
  26. F. Wang, T. Xu, T. Tang, M. Zhou & H. Wang (2017). Bilevel Feature Extraction-Based Text Mining for Fault Diagnosis of Railway Systems. IEEE Transactions on Intelligent Transportation Systems, 18(1), 49-58. https://doi.org/10.1109/TITS.2016.2521866
  27. L. M. Abualigah, A. T.Khader, M. A. Al-Betar, & O. A. Alomari. (2017). Text feature selection with a robust weight schemeand dynamic dimension reduction to text document clustering. Expert Systemswith Applications, 84, 24-36. https://doi.org/10.1016/j.eswa.2017.05.002
  28. F. J. Costello & K. C. Lee. (2019). Exploring the Performance of Synthetic Minority Over-sampling Technique (SMOTE) to Predict Good Borrowers in P2P Lending. Journal of Digital Convergence, 17(9), 71-78. https://doi.org/10.14400/JDC.2019.17.9.071