DOI QR코드

DOI QR Code

A Classification Method Using Data Reduction

  • Uhm, Daiho (Department of Statistics, Oklahoma State University) ;
  • Jun, Sung-Hae (Department of Statistics, Cheongju University) ;
  • Lee, Seung-Joo (Department of Statistics, Cheongju University)
  • Received : 2012.02.24
  • Accepted : 2012.03.07
  • Published : 2012.03.25

Abstract

Data reduction has been used widely in data mining for convenient analysis. Principal component analysis (PCA) and factor analysis (FA) methods are popular techniques. The PCA and FA reduce the number of variables to avoid the curse of dimensionality. The curse of dimensionality is to increase the computing time exponentially in proportion to the number of variables. So, many methods have been published for dimension reduction. Also, data augmentation is another approach to analyze data efficiently. Support vector machine (SVM) algorithm is a representative technique for dimension augmentation. The SVM maps original data to a feature space with high dimension to get the optimal decision plane. Both data reduction and augmentation have been used to solve diverse problems in data analysis. In this paper, we compare the strengths and weaknesses of dimension reduction and augmentation for classification and propose a classification method using data reduction for classification. We will carry out experiments for comparative studies to verify the performance of this research.

Keywords

References

  1. J. Han, M. Kamber, Data Mining Concepts and Techniques, 2nd edition, Morgan Kaufmann, 2006.
  2. R.A. Johnson, D. W. Wichern, Applied Multivariate Statistical Analysis, 3rd edition, Prentice Hall, 1992.
  3. J. H. Friedman, "On Bias, Variance, 0/1-loss, and the Curse of Dimensionality," Data Mining and Knowledge Discovery vol. 1, pp. 55-77, 1997.
  4. M. A. Tanner, Tools for Statistical Inference, Springer, 1996.
  5. Y. Youk, S. Kim, Y. Joo, "Intelligent Data Reduction Algorithm for Sensor Network based Fault Diagnostic System," International Journal of Fuzzy Logic and Intelligent Systems, vol. 9, no. 4, pp. 301-308, 2009. https://doi.org/10.5391/IJFIS.2009.9.4.301
  6. J. Keum, H. Lee, M. Hagiwara, "A Novel Speech/Music Discrimination Using Feature Dimensionality Reduction," International Journal of Fuzzy Logic and Intelligent Systems, vol. 10, no. 1, pp. 7-11, 2010. https://doi.org/10.5391/IJFIS.2010.10.1.007
  7. V. Cherkassky, F. Mulier, Learning from data, Concepts, Theory, and Methods, John Wiley & Sons, 1998.
  8. I. Oh, Pattern Recognition, Kyobo, 2008.
  9. V. N. Vapnik, Statistical Learning Theory, Wiley, 1998.
  10. T. Hastie, R. Tibshirani, J. Friedman, The elements of statistical learning, data mining, inference, and prediction, Springer, 2001.
  11. N. G. Polson, S. L. Scotty, "Data Augmentation for Support Vector Machines," Bayesian Analysis, vol. 6, no. 1, pp. 1-24, 2011. https://doi.org/10.1214/11-BA601
  12. L. Scrucca, "Model-based SIR for dimension reduction," Computational Statistics and Data Analysis, vol. 55, pp. 3010-3026, 2011. https://doi.org/10.1016/j.csda.2011.05.006
  13. K. C. Li, "Sliced inverse regression for dimension reduction," Journal of the American Statistical Association, vol. 86, pp. 316-342, 1991. https://doi.org/10.1080/01621459.1991.10475035
  14. UCI ML Repository, http://archive.ics.uci.edu/ml/
  15. P. Giudici, Applied Data Mining, Statistical Methods for Business and Industry, Wiley, 2003.
  16. V. Cherkassky, F. Mulier, Learning from data Concepts, Theory, and Methods, John Wiley & Sons, 1998.
  17. P. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining, Addison Wesley, 2006.
  18. T. M. Mitchell, Machine Learning, McGraw-Hill, 1997.
  19. R Development Core Team, R: A language and environment for statistical computing, R Foundation for Statistical Computing, http://www.R-project.org, 2001.