DOI QR코드

DOI QR Code

Influence of Data Preprocessing

  • Zhu, Changming (College of Information Engineering, Shanghai Maritime University and Department of Computer Science & Engineering, East China University of Science & Technology) ;
  • Gao, Daqi (Department of Computer Science & Engineering, East China University of Science & Technology)
  • 투고 : 2015.03.19
  • 심사 : 2016.05.20
  • 발행 : 2016.06.30

초록

In this paper, we research the influence of data preprocessing. We conclude that using different preprocessing methods leads to different classification performances. Moreover, not all data preprocessing methods are necessary, and a criterion is given to make sure which data preprocessing is necessary and which one is effective. Experiments on some real-world data sets validate that different data preprocessing methods result in different effects. Furthermore, experiments about some algorithms with different preprocessing methods also confirm that preprocessing has a great influence on the performance of a classifier.

키워드

참고문헌

  1. S. Chen, Y. Zhu, D. Zhang, and J. Y. Yang, "Feature extraction approaches based on matrix pattern: MatPCA and Mat-FLDA," Pattern Recognition Letters, vol. 26, no. 8, pp. 1157-1167, 2005. https://doi.org/10.1016/j.patrec.2004.10.009
  2. N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, Cambridge: Cambridge University Press, 2000.
  3. J. A. Hartigan and M. A. Wong, "Algorithm AS 136: a kmeans clustering algorithm," Journal of the Royal Statistical Society Series C (Applied Statistics), vol. 28, no. 1, pp. 100-108, 1979.
  4. A. J. Jain and R. C. Dubes, Algorithms for Clustering Data, Englewood Cliffs, NJ: Prentice-Hall Inc., 1988.
  5. C. Zhu, "Improved multi-kernel classification machine with Nyström approximation technique and Universum data," Neurocomputing, vol. 175A, pp. 610-634, 2016.
  6. V. N. Vapnik, Statistical Learning Theory, New York: Wiley, 1998.
  7. E. Fix and J. L. Hodges, "Discriminatory analysis: nonparametric discrimination: consistency properties," International Statistical Review, vol. 57, no. 3, pp. 238-247, 1989. https://doi.org/10.2307/1403797
  8. A. Y. Ng, M. I. Jordan, and Y. Weiss, "On spectral clustering: analysis and an algorithm," Advances in Neural Information Processing Systems, vol. 2, pp. 849-856, 2002.
  9. S. X. Yu and J. Shi, "Multiclass spectral clustering," in Proceedings of 9th IEEE International Conference on Computer Vision, Nice, France, 2003, pp. 313-319.
  10. K. Person, "On lines and planes of closest fit to system of points in space," Philiosophical Magazine Series 6, vol. 2, no. 11, pp. 559-572, 1901. https://doi.org/10.1080/14786440109462720
  11. I. T. Jolliffe, Principal Component Analysis, New York: Springer, 2002.
  12. C. Saunders, J. Shawe-Taylor, and A. Vinokourov, "String kernels, fisher kernels and finite state automata," Advances in Neural Information Processing Systems, vol. 15, pp. 649-656, 2003.
  13. D. J. Newman, S. Hettich, C. L. Blake, C. J. Merz, and D. W. Aha, "UCI repository of machine learning databases," 1998; http://archive.ics.uci.edu/ml/datasets.htm.