Browse > Article
http://dx.doi.org/10.5392/JKCA.2017.17.01.039

Time-Series based Dataset Selection Method for Effective Text Classification  

Chae, Yeonghun (UST)
Jeong, Do-Heon (KISTI)
Publication Information
Abstract
As the Internet technology advances, data on the web is increasing sharply. Many research study about incremental learning for classifying effectively in data increasing. Web document contains the time-series data such as published date. If we reflect time-series data to classification, it will be an effective classification. In this study, we analyze the time-series variation of the words. We propose an efficient classification through dividing the dataset based on the analysis of time-series information. For experiment, we corrected 1 million online news articles including time-series information. We divide the dataset and classify the dataset using SVM and $Na{\ddot{i}}ve$ Bayes. In each model, we show that classification performance is increasing. Through this study, we showed that reflecting time-series information can improve the classification performance.
Keywords
SVM; $Na{\ddot{i}}ve$ Bayes; Time-Series Analysis; Machine Learning; Classification;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Derry Tanti Wijaya and Reyyan Yeniterzi, "Understanding Semantic Change of Words Over Centuries," DETECT, 2011.
2 Do-Heon Jeong and Min Song, "Time gap analysis by he topic model-based temporal technique," Journal of Informetrics, 제8권, 제3호, pp.776-790, 2014.   DOI
3 https://nodejs.org
4 http://visjs.org
5 http://www.highcharts.com
6 정도헌, 정창후, 김장원, 김태홍, 빅데이터 마이닝을 위한 점진적 학습 기술 개발, KISTI 성과보고서, 2015.
7 A. McCallum and K. Nigam, "A Comparison of Event Models for Naive Bayes Text Classification," AAAI '98, 1998.
8 Irina Rish, An empirical study of the naive Bayes classifier, IBM Research Report, 2001.
9 C. Cortes andV. Vapnik, "Support-Vector Net-works," Machine Learning, 제20권, 제3호, pp.273-297, 1995.   DOI
10 B. E. Boser, I. M. Guyon, and V. N. Vapnik, "A training algorithm for optimal margin classifiers," COLT '92, 1992.
11 H. Taira and M. Haruno, "Feature selection in SVM text categorization," AAAI, 1999.
12 F. Colas and P. Brazdil, "Comparison of SVM and some older classification algorithms in text classification tasks," IFIP, 2006.
13 Pascal Soucy and Guy W. Mineau, "Beyond TF -IDF Weighting for Text Categorization in the Vector Space Model," IJCAI, 제5권, pp.1130-1135, 2005.
14 G. Forman, "BNS Feature Scaling: An Improved Representation over TF.IDF for SVM Text Classification," ACM, 2008.
15 Yiming Yang and Jan O. Pedersen, "A comparative study on feature selection in text categorization," ICML, 제97권, pp.412-420, 1997.
16 D. Jeong, J. Kim, M. Hwang, S. Song, and H. Jung, "Classification Method by Integrating Feature PropertyMatrices for Large Scale Data," SMA, 2012.
17 Saket S. R. Mengle and Nazli Goharian, "Ambiguity Measure Feature-Selection Algorithm," Journal of the American Society for Information Science and Technology, 제60권, 제5호, pp.1037-1050, 2009.   DOI
18 정도헌, "최대 개념강도 인지기법을 이용한 데이터베이스 자동선택 방법에 관한 연구," 정보관리학회지, 제27권, 제3호, pp.265-281, 2010.   DOI
19 J. Gim, Y. Jang, D. Jeong, and H. Jung, "Anayzing Email Patterns with Timelines on Researcher Data," JIST 2014, 2014.
20 B. Croft, "Machine Learning and Information Retrieval," ICML '95, 1995.
21 E. Jessica, "Forecast: Mobile Data Traffic, Worldwide, 2011-2018," Gartner, 2015.
22 H. Chih and N. Kulathuramaiyer, "An empirical study of feature selection for text categorization based on term weightage," In Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence, pp.599-602, 2004.