Nearest-neighbor Rule based Prototype Selection Method and Performance Evaluation using Bias-Variance Analysis

Shim, Se-Yong;Hwang, Doo-Sung;

doi:10.5573/ieie.2015.52.10.073

Journal of the Institute of Electronics and Information Engineers (전자공학회논문지)

Volume 52 Issue 10
/
Pages.73-81
/
2015
/
2287-5026(pISSN)
/
2288-159X(eISSN)

The Institute of Electronics and Information Engineers (대한전자공학회)

DOI QR Code

Nearest-neighbor Rule based Prototype Selection Method and Performance Evaluation using Bias-Variance Analysis

최근접 이웃 규칙 기반 프로토타입 선택과 편의-분산을 이용한 성능 평가

Shim, Se-Yong (Dept. of Computer Science, Dankook University) ;
Hwang, Doo-Sung (Dept. of Computer Science, Dankook University)

심세용 (단국대학교 컴퓨터과학과) ;
황두성 (단국대학교 컴퓨터과학과)

Received : 2015.04.17
Accepted : 2015.09.30
Published : 2015.10.25

https://doi.org/10.5573/ieie.2015.52.10.073 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

The paper proposes a prototype selection method and evaluates the generalization performance of standard algorithms and prototype based classification learning. The proposed prototype classifier defines multidimensional spheres with variable radii within class areas and generates a small set of training data. The nearest-neighbor classifier uses the new training set for predicting the class of test data. By decomposing bias and variance of the mean expected error value, we compare the generalization errors of k-nearest neighbor, Bayesian classifier, prototype selection using fixed radius and the proposed prototype selection method. In experiments, the bias-variance changing trends of the proposed prototype classifier are similar to those of nearest neighbor classifiers with all training data and the prototype selection rates are under 27.0% on average.

이 논문은 프로토타입 선택 방법을 제안하고, 편의-분산 분해를 이용하여 최근접 이웃 알고리즘과 프로토타입 기반 분류 학습의 일반화 성능 비교 평가에 있다. 제안하는 프로토타입 분류기는 클래스 영역 내에서 가변 반지름을 이용한 다차원 구를 정의하고, 적은 수의 프로토타입으로 구성된 새로운 훈련 데이터 집합을 생성한다. 최근접 이웃 분류기는 새 훈련 집합을 이용하여 테스트 데이터의 클래스를 예측한다. 평균 기대 오류의 편의와 분산 요소를 분해하여 최근접 이웃 규칙, 베이지안 분류기, 고정 반지름을 이용한 프로토타입 선택 방법, 제안하는 프로토타입 선택 방법의 일반화 성능을 비교한다. 실험에서 제안하는 프로토타입 분류기의 편의-분산 변화 추세는 모든 훈련 데이터를 사용하는 최근접 이웃 알고리즘과 비슷한 편의-분산 추세를 보였으며, 프로토타입 선택 비율은 전체 데이터의 평균 약 27.0% 이하로 나타났다.

Keywords

References

X. Wu et al., "The top ten algorithms in data mining," CRC Press, 2009.
T. Hastie, R. Tibshirani, and J. Friedman, "The Elements of Statistical Learning: Data Mining," Inference, and Prediction, Springer Series in Statistics, 2001.
J. Arturo Olvera-Lopez, J. Ariel Carrasco-Ochoa, J. Francisco Martinez Trinidad, and J. Kittler, "A review of instance selection methods," Artif. Intell. Rev Vol. 34, No. 2, pp. 133-143, Aug. 2010. https://doi.org/10.1007/s10462-010-9165-y
P. Flach, "Machine Learning, The Art and Science of Algorithms that Make Sense of Data," Cambridge University Press, 2012.
R. Kohavi, D. H. Wolpert, "Bias Plus Variance Decomposition for Zero-One Loss Functions," In Proceedings of the Thirteenth International Conference on Machine Learning, 275-283, 1996.
P. Domingos, "A United Bias-Variance Decomposition for Zero-One and Squared Loss," In Proceedings of the Seventeenth National Conference on Artificial Intelligence, 231-238, 2000.
J. Bien and R. Tibshirani, "Prototype selection for interpretable classification," The Annuals of Applied Statistics Vol. 5, No. 4, pp. 2403-2424, Dec, 2011. https://doi.org/10.1214/11-AOAS495
D. S. Hwang, "Performance Improvement of Nearest-neighbor Classification Learning through Prototype Selection," Journal of The Institute of Electronics Engineers of Korea, Vol. 49(2)-CI, pp. 53-60, Mar. 2012.
F. Angiulli, "Fast Nearest Neighbor Condensation for Large Data Sets Classification," IEEE Transactions on Knowledge and Data Engineering, Vol. 19, No. 11, pp. 1450-1464, Nov. 2007. https://doi.org/10.1109/TKDE.2007.190645
D. Marchette, "Class cover catch digraphs," Wiley Interdisciplinary Reviews : Computational Statistics Vol. 2, No. 2, pp. 171-177, Mar. 2010. https://doi.org/10.1002/wics.70
R. Younsi, and A. Bagnall, "An efficient randomised sphere cover classifier," Int. J. of Data Mining, Modelling and Management, Vol. 4, No. 2, pp.156-171, Jan. 2012. https://doi.org/10.1504/IJDMMM.2012.046808
S. W. Kim, "Relational Discriminant Analysis Using Prototype Reduction Schemes and Mahalanobis Distances," Journal of The Institute of Electronics Engineers of Korea, Vol. 43(1)-CI, pp. 9-16, Jan. 2006.
Dietterich, T. G and Kong, E. B., "Machine learning bias, statistical bias, and statistical variance of decision tree algorithms," Technical report, Department of Computer Science, Oregon State University, 1995.
UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/
The DELVE Manual, http://www.cs.utoronto.ca/-deve/
Stalog project, http://www1.maths.leed.ac.uk/-charles/statlog/ indexdos.html

Journal of the Institute of Electronics and Information Engineers (전자공학회논문지)

Nearest-neighbor Rule based Prototype Selection Method and Performance Evaluation using Bias-Variance Analysis

최근접 이웃 규칙 기반 프로토타입 선택과 편의-분산을 이용한 성능 평가

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)