[KSCI] Korea Science Citation Index Service

Performance Improvement of Nearest-neighbor Classification Learning through Prototype Selections

Hwang, Doo-Sung (Dept. of computer science, Dankook University)

Publication Information

Journal of the Institute of Electronics Engineers of Korea CI / v.49, no.2, 2012 , pp. 53-60 More about this Journal

Abstract

Nearest-neighbor classification predicts the class of an input data with the most frequent class among the near training data of the input data. Even though nearest-neighbor classification doesn't have a training stage, all of the training data are necessary in a predictive stage and the generalization performance depends on the quality of training data. Therefore, as the training data size increase, a nearest-neighbor classification requires the large amount of memory and the large computation time in prediction. In this paper, we propose a prototype selection algorithm that predicts the class of test data with the new set of prototypes which are near-boundary training data. Based on Tomek links and distance metric, the proposed algorithm selects boundary data and decides whether the selected data is added to the set of prototypes by considering classes and distance relationships. In the experiments, the number of prototypes is much smaller than the size of original training data and we takes advantages of storage reduction and fast prediction in a nearest-neighbor classification.

Keywords

Prototype Selection; Nearest Neighbor Rule; Tomek Link;

Citations & Related Records

Reference

1	Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006.
2	X. Wu and V. Kumar, Eds, The Top Ten Algorithms in Data Mining, Chapman & Hall/CRC Data Mining and Knowledge Discovery, 2009.
3	S. García, J. Derrac, J.R. Cano, F. Herrera, "Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 34, No. 3, pp. 417-435, 2012. DOI
4	K. Yu, L. Ji, and X. Zhang, "Kernel nearest-neighbor algorithm", Neural Processing Letters, Vol.15, pp.147-156, 2002. DOI ScienceOn
5	P. Jeatrakul, K.W. Wong and C.C. Fung, "Data cleaning for classification using misclassification analysis," Journal of Advanced Computational and Intelligent Informatics, Vol.14, No.3, pp. 297-302, 2010. DOI
6	T.M. Cover and P. E. Hart, "Nearest neighbor pattern classification," IEEE Trans. on Information Theory, Vol. 13, No. 1, pp. 21-27, 1967. DOI
7	F. Angiulli, "Fast nearest neighbor condensation for large data sets classification," IEEE Trans. Knowledge and Data Engineering, Vol.19, pp. 1450-1464, 2007. DOI
8	H. A. Fayed and A. F. Atiya, "A Novel Template Reduction Approach for the K-Nearest Neighbor Method," IEEE Trans. on Neural Networks, Vol.20, No. 5, pp.890-896, 2009. DOI
9	H. J. Shin and S. Z. Cho, "Response modeling with support vector machines," Expert Systems with Applications, Vol.30, No.4, pp.746-760, 2006. DOI ScienceOn
10	J. Wang, P. Neskovic, and L. N. Cooper, "Neighborhood size selection in the k-nearest-neighbor rule using statistical confidence," Pattern Recognition, Vol.39, No.3, pp.417-423, 2006. DOI ScienceOn
11	UCI machine learning repository, http://archive.ics.uci.edu/ml/.
12	Ian H. Witten and Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques, 2nd edition, Elsevier, 2005.
13	C. Ferri, P. Flach and J. Hernndez-Orallo, "Learning Decision Trees Using the Area Under ROC Curve," Proceedings of the 19th International Conference on Machine Learning(ICML-2002), pp. 139-146, 2002.
14	Jin Huang and Charles X. Ling, "Using AUC and Accuracy in Evaluating Learning Algorithms," IEEE Trans. on Knowledge and Data Engineering, Vol. 17, No. 3, pp. 299-310, 2005. DOI
15	Jesse Davis and Mark Goadrich, "The relationship between Precision-Recall and ROC curves," Proceedings of the 23th International Conference on Machine Learning(ICML-2006), pp. 233-240, 2006.

KSCI

Performance Improvement of Nearest-neighbor Classification Learning through Prototype Selections 프로토타입 선택을 이용한 최근접 분류 학습의 성능 개선

Performance Improvement of Nearest-neighbor Classification Learning through Prototype Selections