Browse > Article
http://dx.doi.org/10.1633/JISTaP.2014.2.3.3

Study of Machine-Learning Classifier and Feature Set Selection for Intent Classification of Korean Tweets about Food Safety  

Yeom, Ha-Neul (Korea University of Science and Technology (UST) Korea Institute of Science and Technology Information (KISTI))
Hwang, Myunggwon (Korea Institute of Science and Technology Information (KISTI))
Hwang, Mi-Nyeong (Korea Institute of Science and Technology Information (KISTI))
Jung, Hanmin (Korea University of Science and Technology (UST) Korea Institute of Science and Technology Information (KISTI))
Publication Information
Journal of Information Science Theory and Practice / v.2, no.3, 2014 , pp. 29-39 More about this Journal
Abstract
In recent years, several studies have proposed making use of the Twitter micro-blogging service to track various trends in online media and discussion. In this study, we specifically examine the use of Twitter to track discussions of food safety in the Korean language. Given the irregularity of keyword use in most tweets, we focus on optimistic machine-learning and feature set selection to classify collected tweets. We build the classifier model using Naive Bayes & Naive Bayes Multinomial, Support Vector Machine, and Decision Tree Algorithms, all of which show good performance. To select an optimum feature set, we construct a basic feature set as a standard for performance comparison, so that further test feature sets can be evaluated. Experiments show that precision and F-measure performance are best when using a Naive Bayes Multinomial classifier model with a test feature set defined by extracting Substantive, Predicate, Modifier, and Interjection parts of speech.
Keywords
Twitter; Tweets; Machine-learning Feature; Text Classification;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter, 11(1), 10-18.   DOI
2 KAIST Semantic Web Research Center, (2011). Hannanum Korean Morphological Analyzer User Manual.
3 Lampos, V., Bie, T. D., & Cristianini, N. (2010). Flu detector-tracking epidemics on Twitter. Machine Learning and Knowledge Discovery in Databases, 6323, 599-602.
4 McCallum, A., & Nigam, K. (1998). A comparison of event models for Naive Bayes text classification. AAAI-98 Workshop on Learning for Text Categorization, 752, 41-48.
5 Paul, M. J., & Dredze, M. (2011). You are what you Tweet: Analyzing Twitter for public health. Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media.
6 Rokach, L., & Maimon, O. (2005). Decision trees. Data mining and knowledge discovery handbook. Springer, US, 165-192.
7 Tumasjan, A., Sprenger, T. O., Sandner, P. G., & Welpe, I. M. (2010). Predicting elections with Twitter: What 140 characters reveal about political sentiment. Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media, 178-185.
8 Youn, S., & McLeod, D. (2007). A comparative study for email classification. Advances and Innovations in Systems, Computing Sciences and Software Engineering, Springer, Netherland, 387-391.
9 Drucker, H., Wu, D., & Vapnik, V. N. (1999). Support vector machines for spam categorization. IEEE Transactions on Neural Networks, 10(5), 1048-1054.   DOI   ScienceOn
10 Androutsopoulos, I., Koutsias, J., Chandrinos, K. V., Paliouras, G., & Spyropoulos, C. D. (2000). An evaluation of Naive Bayesian anti-spam filtering. Proceedings of the Workshop on Machine Learning in the New Information Age, 11th European Conference on Machine Learning, 9-17.
11 Chen, B. (2010). Chapter 6. Classification and prediction. Lecture Note Distributed in Data Mining CSCI 4370/5370 at Georgia State University, Retrieved June 2, 2014, from http://storm.cis.fordham.edu/-yli/documents/CISC4631Spring12/Chapter6_Class1.ppt.
12 Choi, D., Hwang, M., Kim, J., Ko, B., & Kim, P. (2014). Tracing trending topics by analyzing the sentiment status of Tweets. Computer Science and Information Systems, 11(1), 157-169.   DOI   ScienceOn
13 Dustin, B. (2002). Introduction to support vector machines. Retrieved Jun 2, 2014, from https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0CCYQFjAA&url=http%3A%2F%2Fwww.work.caltech.edu%2F-boswell%2FIntroToSVM.pdf&ei=4f-6SU9P2A4K48gW6noHIBw&usg=AFQjCNGlfz-DO-ZpOtj219pI81FgjP2yyEA&sig2=SdwK6MV4e2EVzFaZuZhLEw
14 Lee, W., Kim, S., Kim, G., & Choi, K. (1999). Implementation of modularized morphological analyzer. Proceedings of Korean Institute of Information Scientists and Engineers: Special Interest Group on Human Language Technology, 123-136.