DOI QR코드

DOI QR Code

Detection of the Change in Blogger Sentiment using Multivariate Control Charts

다변량 관리도를 활용한 블로거 정서 변화 탐지

  • Moon, Jeounghoon (BigAnalytics, Information Strategy Department, ElandSystems Ltd.) ;
  • Lee, Sungim (Department of Statistics, Dankook University)
  • 문정훈 (이랜드 시스템즈, 정보전략실, 빅어낼러틱스팀) ;
  • 이성임 (단국대학교 응용통계학과)
  • Received : 2013.07.17
  • Accepted : 2013.10.29
  • Published : 2013.12.31

Abstract

Social network services generate a considerable amount of social data every day on personal feelings or thoughts. This social data provides changing patterns of information production and consumption but are also a tool that reflects social phenomenon. We analyze negative emotional words from daily blogs to detect the change in blooger sentiment using multivariate control charts. We used the all the blogs produced between 1 January 2008 and 31 December 2009. Hotelling's T-square control chart control chart is commonly used to monitor multivariate quality characteristics; however, it assumes that quality characteristics follow multivariate normal distribution. The performance of a multivariate control chart is affected by this assumption; consequently, we introduce the support vector data description and its extension (K-control chart) suggested by Sun and Tsung (2003) and they are applied to detect the chage in blogger sentiment.

최근 소셜 네크워크 서비스의 발달로 인해 개인의 감정이나 의견을 표현하는 소셜 데이터들이 하루에도 수백만 건씩 생산되고 있다. 또한 소셜 데이터는 개인의 의견에 또 다른 생각을 더하는 등 정보의 생산과 소비가 누구나 가능해짐으로써 사회현상을 잘 반영해주는 도구로 성장하고 있다. 본 연구에서는 블로그에 올라온 부정적인 감성어들을 분석하여 블로거의 감성변화를 탐지하기 위해 다변량 관리도를 이용하고자 한다. 이를 위해 2008년 1월 1일부터 2009년 12월 31일 사이에 생성되었던 모든 블로그를 사용하였다. 품질 특성치가 다변량으로 주어지는 경우 호텔링의 $T^2$ 관리도가 널리 사용된다. 그러나 이 관리도는 품질 특성치들의 분포가 다변량 정규분포라는 가정을 하고 있어, 비정규 다변량 자료에 대한 관리도의 성능은 좋지 않다. 이에 본 논문에서는 Sun과 Tsung (2003)이 제안한 써포트 벡터머신에서 단일 집합 분류 기법 중 하나인 SVDD(support vector data description) 알고리즘과 이를 확장한 K-관리도를 소개하고, 실제 데이터 분석에 적용해 보았다.

Keywords

References

  1. Chiang, L. H., Russell, E. L. and Braatz, R. D. (2001). Fault Detection and Diagnosis in Industrial Systems, Springer, New York.
  2. Crosier, R. B. (1988). Multivariate generalizations of cumulative sum quality-control schemes, Technometrics, 30, 291-303. https://doi.org/10.1080/00401706.1988.10488402
  3. Gani, W., Taleb, H. and Limam, M. (2011). An assessment of the kernel distance-based multivariate control chart through an industrial application, Quality and Reliability Engineering International, 27, 391-401. https://doi.org/10.1002/qre.1117
  4. Hong, J. H. (2011). The detection of public opinion and public opinion cycle via aggregated twitter opinion and sentiment, Korean Journal of Communication Studies, 19, 5-29.
  5. Hotelling, H. (1931). The generalization of sutdent's ratio, The Annals of Mathematical Statistics, 2, 360-378. https://doi.org/10.1214/aoms/1177732979
  6. Kramer, A. D. I. (2010). An unobtrusive behavioral model of "gross national happiness", Proceeding of the 28th International Conference on Human Factors in Computing Systems, New York, 287-290.
  7. Lowry, C. A., Woodall, W. H., Cahmp, C. W. and Riddon, S. E. (1992). A multivariate exponentially weighted moving average control chart, Journal of Quality Technology, 34, 46-53.
  8. Montgomery, D. C. (2001). Introduction to Statistical Quality Control, John Wiley & Sons, USA.
  9. Oh, K. S. (2010). A study on the strategic approach to m-Government in the age of social media, Social Science Studies, 34, 135-161.
  10. Prabhu, S. S. and Runger, G. C. (1997). Designing a multivariate EWMA control chart, Journal of Quality Technology, 29, 8-15.
  11. Shewhart, W. A. (1931). Economic control of 1uality of Mmnufactured product, Republished in 1980 by the American Society for Quality Control, D. Van Nostrand Company, Inc., New York.
  12. Sukchotrat, T., Kim, S. B. and Tsung, F. (2010). One-class classification-based control charts for multivariate process monitoring, IIE Transactions, 42, 107-120.
  13. Sun, R. and Tsung, F. (2003). A kernel-distance-based multivariate control charts using support vector methods, International Journal of Production Research, 41, 2975-2989. https://doi.org/10.1080/1352816031000075224
  14. Tax, D. and Duin, R. (2004). Support vector data description, Machine Learning, 54, 45-66. https://doi.org/10.1023/B:MACH.0000008084.60811.49

Cited by

  1. Robust determination of control parameters in K chart with respect to data structures vol.26, pp.6, 2015, https://doi.org/10.7465/jkdi.2015.26.6.1353