Browse > Article

Performance Improvement of Nearest-neighbor Classification Learning through Prototype Selections  

Hwang, Doo-Sung (Dept. of computer science, Dankook University)
Publication Information
Abstract
Nearest-neighbor classification predicts the class of an input data with the most frequent class among the near training data of the input data. Even though nearest-neighbor classification doesn't have a training stage, all of the training data are necessary in a predictive stage and the generalization performance depends on the quality of training data. Therefore, as the training data size increase, a nearest-neighbor classification requires the large amount of memory and the large computation time in prediction. In this paper, we propose a prototype selection algorithm that predicts the class of test data with the new set of prototypes which are near-boundary training data. Based on Tomek links and distance metric, the proposed algorithm selects boundary data and decides whether the selected data is added to the set of prototypes by considering classes and distance relationships. In the experiments, the number of prototypes is much smaller than the size of original training data and we takes advantages of storage reduction and fast prediction in a nearest-neighbor classification.
Keywords
Prototype Selection; Nearest Neighbor Rule; Tomek Link;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006.
2 X. Wu and V. Kumar, Eds, The Top Ten Algorithms in Data Mining, Chapman & Hall/CRC Data Mining and Knowledge Discovery, 2009.
3 S. García, J. Derrac, J.R. Cano, F. Herrera, "Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 34, No. 3, pp. 417-435, 2012.   DOI
4 K. Yu, L. Ji, and X. Zhang, "Kernel nearest-neighbor algorithm", Neural Processing Letters, Vol.15, pp.147-156, 2002.   DOI   ScienceOn
5 P. Jeatrakul, K.W. Wong and C.C. Fung, "Data cleaning for classification using misclassification analysis," Journal of Advanced Computational and Intelligent Informatics, Vol.14, No.3, pp. 297-302, 2010.   DOI
6 T.M. Cover and P. E. Hart, "Nearest neighbor pattern classification," IEEE Trans. on Information Theory, Vol. 13, No. 1, pp. 21-27, 1967.   DOI
7 F. Angiulli, "Fast nearest neighbor condensation for large data sets classification," IEEE Trans. Knowledge and Data Engineering, Vol.19, pp. 1450-1464, 2007.   DOI
8 H. A. Fayed and A. F. Atiya, "A Novel Template Reduction Approach for the K-Nearest Neighbor Method," IEEE Trans. on Neural Networks, Vol.20, No. 5, pp.890-896, 2009.   DOI
9 H. J. Shin and S. Z. Cho, "Response modeling with support vector machines," Expert Systems with Applications, Vol.30, No.4, pp.746-760, 2006.   DOI   ScienceOn
10 J. Wang, P. Neskovic, and L. N. Cooper, "Neighborhood size selection in the k-nearest-neighbor rule using statistical confidence," Pattern Recognition, Vol.39, No.3, pp.417-423, 2006.   DOI   ScienceOn
11 UCI machine learning repository, http://archive.ics.uci.edu/ml/.
12 Ian H. Witten and Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques, 2nd edition, Elsevier, 2005.
13 C. Ferri, P. Flach and J. Hernndez-Orallo, "Learning Decision Trees Using the Area Under ROC Curve," Proceedings of the 19th International Conference on Machine Learning(ICML-2002), pp. 139-146, 2002.
14 Jin Huang and Charles X. Ling, "Using AUC and Accuracy in Evaluating Learning Algorithms," IEEE Trans. on Knowledge and Data Engineering, Vol. 17, No. 3, pp. 299-310, 2005.   DOI
15 Jesse Davis and Mark Goadrich, "The relationship between Precision-Recall and ROC curves," Proceedings of the 23th International Conference on Machine Learning(ICML-2006), pp. 233-240, 2006.