[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5391/IJFIS.2012.12.1.1

A Classification Method Using Data Reduction

Uhm, Daiho (Department of Statistics, Oklahoma State University)
Jun, Sung-Hae (Department of Statistics, Cheongju University)
Lee, Seung-Joo (Department of Statistics, Cheongju University)

Publication Information

International Journal of Fuzzy Logic and Intelligent Systems / v.12, no.1, 2012 , pp. 1-5 More about this Journal

Abstract

Data reduction has been used widely in data mining for convenient analysis. Principal component analysis (PCA) and factor analysis (FA) methods are popular techniques. The PCA and FA reduce the number of variables to avoid the curse of dimensionality. The curse of dimensionality is to increase the computing time exponentially in proportion to the number of variables. So, many methods have been published for dimension reduction. Also, data augmentation is another approach to analyze data efficiently. Support vector machine (SVM) algorithm is a representative technique for dimension augmentation. The SVM maps original data to a feature space with high dimension to get the optimal decision plane. Both data reduction and augmentation have been used to solve diverse problems in data analysis. In this paper, we compare the strengths and weaknesses of dimension reduction and augmentation for classification and propose a classification method using data reduction for classification. We will carry out experiments for comparative studies to verify the performance of this research.

Keywords

Data reduction and augmentation; Gaussian mixture model; Principal component analysis; Support vector machine; K-nearest neighbor;

Citations & Related Records

Times Cited By KSCI : 2 (Citation Analysis)

Reference
Cited By KSCI

1	L. Scrucca, "Model-based SIR for dimension reduction," Computational Statistics and Data Analysis, vol. 55, pp. 3010-3026, 2011. DOI ScienceOn
2	K. C. Li, "Sliced inverse regression for dimension reduction," Journal of the American Statistical Association, vol. 86, pp. 316-342, 1991. DOI ScienceOn
3	UCI ML Repository, http://archive.ics.uci.edu/ml/
4	P. Giudici, Applied Data Mining, Statistical Methods for Business and Industry, Wiley, 2003.
5	V. Cherkassky, F. Mulier, Learning from data Concepts, Theory, and Methods, John Wiley & Sons, 1998.
6	P. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining, Addison Wesley, 2006.
7	T. M. Mitchell, Machine Learning, McGraw-Hill, 1997.
8	R Development Core Team, R: A language and environment for statistical computing, R Foundation for Statistical Computing, http://www.R-project.org, 2001.
9	J. Han, M. Kamber, Data Mining Concepts and Techniques, 2nd edition, Morgan Kaufmann, 2006.
10	R.A. Johnson, D. W. Wichern, Applied Multivariate Statistical Analysis, 3rd edition, Prentice Hall, 1992.
11	J. H. Friedman, "On Bias, Variance, 0/1-loss, and the Curse of Dimensionality," Data Mining and Knowledge Discovery vol. 1, pp. 55-77, 1997.
12	V. Cherkassky, F. Mulier, Learning from data, Concepts, Theory, and Methods, John Wiley & Sons, 1998.
13	M. A. Tanner, Tools for Statistical Inference, Springer, 1996.
14	Y. Youk, S. Kim, Y. Joo, "Intelligent Data Reduction Algorithm for Sensor Network based Fault Diagnostic System," International Journal of Fuzzy Logic and Intelligent Systems, vol. 9, no. 4, pp. 301-308, 2009. DOI ScienceOn
15	J. Keum, H. Lee, M. Hagiwara, "A Novel Speech/Music Discrimination Using Feature Dimensionality Reduction," International Journal of Fuzzy Logic and Intelligent Systems, vol. 10, no. 1, pp. 7-11, 2010. DOI ScienceOn
16	I. Oh, Pattern Recognition, Kyobo, 2008.
17	V. N. Vapnik, Statistical Learning Theory, Wiley, 1998.
18	T. Hastie, R. Tibshirani, J. Friedman, The elements of statistical learning, data mining, inference, and prediction, Springer, 2001.
19	N. G. Polson, S. L. Scotty, "Data Augmentation for Support Vector Machines," Bayesian Analysis, vol. 6, no. 1, pp. 1-24, 2011. DOI