Influence of Data Preprocessing

Zhu, Changming;Gao, Daqi;

doi:10.5626/JCSE.2016.10.2.51

Journal of Computing Science and Engineering

제10권2호
/
Pages.51-57
/
2016
/
1976-4677(pISSN)
/
2093-8020(eISSN)

한국정보과학회 (Korean Institute of Information Scientists and Engineers)

DOI QR Code

Influence of Data Preprocessing

Zhu, Changming (College of Information Engineering, Shanghai Maritime University and Department of Computer Science & Engineering, East China University of Science & Technology) ;
Gao, Daqi (Department of Computer Science & Engineering, East China University of Science & Technology)

투고 : 2015.03.19
심사 : 2016.05.20
발행 : 2016.06.30

https://doi.org/10.5626/JCSE.2016.10.2.51 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

In this paper, we research the influence of data preprocessing. We conclude that using different preprocessing methods leads to different classification performances. Moreover, not all data preprocessing methods are necessary, and a criterion is given to make sure which data preprocessing is necessary and which one is effective. Experiments on some real-world data sets validate that different data preprocessing methods result in different effects. Furthermore, experiments about some algorithms with different preprocessing methods also confirm that preprocessing has a great influence on the performance of a classifier.

키워드

참고문헌

S. Chen, Y. Zhu, D. Zhang, and J. Y. Yang, "Feature extraction approaches based on matrix pattern: MatPCA and Mat-FLDA," Pattern Recognition Letters, vol. 26, no. 8, pp. 1157-1167, 2005. https://doi.org/10.1016/j.patrec.2004.10.009
N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, Cambridge: Cambridge University Press, 2000.
J. A. Hartigan and M. A. Wong, "Algorithm AS 136: a kmeans clustering algorithm," Journal of the Royal Statistical Society Series C (Applied Statistics), vol. 28, no. 1, pp. 100-108, 1979.
A. J. Jain and R. C. Dubes, Algorithms for Clustering Data, Englewood Cliffs, NJ: Prentice-Hall Inc., 1988.
C. Zhu, "Improved multi-kernel classification machine with Nyström approximation technique and Universum data," Neurocomputing, vol. 175A, pp. 610-634, 2016.
V. N. Vapnik, Statistical Learning Theory, New York: Wiley, 1998.
E. Fix and J. L. Hodges, "Discriminatory analysis: nonparametric discrimination: consistency properties," International Statistical Review, vol. 57, no. 3, pp. 238-247, 1989. https://doi.org/10.2307/1403797
A. Y. Ng, M. I. Jordan, and Y. Weiss, "On spectral clustering: analysis and an algorithm," Advances in Neural Information Processing Systems, vol. 2, pp. 849-856, 2002.
S. X. Yu and J. Shi, "Multiclass spectral clustering," in Proceedings of 9th IEEE International Conference on Computer Vision, Nice, France, 2003, pp. 313-319.
K. Person, "On lines and planes of closest fit to system of points in space," Philiosophical Magazine Series 6, vol. 2, no. 11, pp. 559-572, 1901. https://doi.org/10.1080/14786440109462720
I. T. Jolliffe, Principal Component Analysis, New York: Springer, 2002.
C. Saunders, J. Shawe-Taylor, and A. Vinokourov, "String kernels, fisher kernels and finite state automata," Advances in Neural Information Processing Systems, vol. 15, pp. 649-656, 2003.
D. J. Newman, S. Hettich, C. L. Blake, C. J. Merz, and D. W. Aha, "UCI repository of machine learning databases," 1998; http://archive.ics.uci.edu/ml/datasets.htm.

Journal of Computing Science and Engineering

Influence of Data Preprocessing

초록

키워드

참고문헌

자세히 찾기