Browse > Article
http://dx.doi.org/10.3745/KTSDE.2016.5.11.541

Linguistic Features Discrimination for Social Issue Risk Classification  

Oh, Hyo-Jung (전북대학교 대학원 기록관리학과)
Yun, Bo-Hyun (목원대학교 컴퓨터교육과)
Kim, Chan-Young (전북대학교 의학전문대학원)
Publication Information
KIPS Transactions on Software and Data Engineering / v.5, no.11, 2016 , pp. 541-548 More about this Journal
Abstract
The use of social media is already essential as a source of information for listening user's various opinions and monitoring. We define social 'risks' that issues effect negative influences for public opinion in social media. This paper aims to discriminate various linguistic features and reveal their effects for building an automatic classification model of social risks. Expecially we adopt a word embedding technique for representation of linguistic clues in risk sentences. As a preliminary experiment to analyze characteristics of individual features, we revise errors in automatic linguistic analysis. At the result, the most important feature is NE (Named Entity) information and the best condition is when combine basic linguistic features. word embedding, and word clusters within core predicates. Experimental results under the real situation in social bigdata - including linguistic analysis errors - show 92.08% and 85.84% in precision respectively for frequent risk categories set and full test set.
Keywords
Risk Detection; Text Classification; Linguistic Feature; Feature Discrimination; Word Embedding;
Citations & Related Records
Times Cited By KSCI : 8  (Citation Analysis)
연도 인용수 순위
1 G. H. Kim, S. Trimi, and J. H. Chung, "Big-data applications in the government sector," Communications of the ACM, Vol.57, No.3, pp.78-85, 2014.
2 C. H. Lee, J. Hur, and H. J. Oh, et al., "Technology Trends of Issue Detection and Predictive Analysis on Social Big Data," Electronics and Telecommunications Trends, Vol.28, No.1, pp.62-71, 2013.
3 J. Hur, C. H. Lee, and H. J. Oh, et al, "Automatic Generation of Issue Analysis Report Based on Social Big Data Mining," Korea Information Science Society (KISS) Journals, Vol.3, No.12, pp.553-564, 2014.
4 C. H. Hong and H. S. Kim, "Comparative Study of Various Machine-learning Features for Tweets Sentiment Classification," Korea Contents, Vol.12, No.12, pp.471-478. 2012.
5 M. Y. Ren and S. J. Kang, "Comparison Between Optimal Features of Korean and Chinese for Text Classification," Journal of Korean Institute of Intelligent Systems, Vol.25, No.4, pp.386-391, 2015.   DOI
6 Y. S. Chio and J. W. Cha, "Korean Named Entity Recognition and Classification using Word Embedding Features," Journal of KIISE, Vol.43, No.6, pp.678-685, 2016.   DOI
7 Y. Bengio, R. Ducharme, and P. Vincent, "A neural probabilistic language model," Journal of Machine Learning Research, Vol.3, pp.1137-1155, 2003.
8 T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient Estimation of Word Representations in Vector Space," in Proceedings of the ICLR Workshop, 2013.
9 B. Shim, J. Park, and J. Seo, "Term Weighting Using Date Information and Its Appliance in Automatic Text Classification," in Proceedings of the 19th Annual Conference on Human and Cognitive Language Technology, Vol.10, pp.169-173, 2007.
10 J. In, J. Kim, and S. Chae, "Combined Feature Set and Hybrid Feature Selection Method for Effective Document Classification," Journal of Korean Society for Internet Information, Vol.14, No.5, pp.49-57, 2013.
11 H. K Lee, S. Yang, and Y.J. Ko, "Feature Expansion based on LDA Word Distribution for Performance Improvement of Informal Document Classification," Journal of KIISE, Vol. 43, No.9, pp.1008-1014, 2016.   DOI
12 Word2vec [Internet], https://code.google.com/p/word2vec/.
13 H. G Yoon, S. J. Chio, and S. B. Park, "Improving The Performance of Triple Generation Based on Distant Supervision By Using Semantic Similarity," Journal of KIISE, Vol.43, No.6, pp.653-661, 2016.   DOI
14 H. J. Oh, S. J An, and Y. Kim, "Social Issue Risk Type Classification based on Social Bigdata," Jounrnal of the Korea Contents Association, Vol.16, No.8, pp.1-9, 2016.
15 S. J. Lim, C. K. Lee, and D. Y. Ra, "Dependency-based semantic role labeling using sequence labeling with a structural SVM," Pattern Recognition Letters, Vol.34, No.6, pp.696-702, 2013.   DOI