DOI QR코드

DOI QR Code

Nearest-neighbor Rule based Prototype Selection Method and Performance Evaluation using Bias-Variance Analysis

최근접 이웃 규칙 기반 프로토타입 선택과 편의-분산을 이용한 성능 평가

  • 심세용 (단국대학교 컴퓨터과학과) ;
  • 황두성 (단국대학교 컴퓨터과학과)
  • Received : 2015.04.17
  • Accepted : 2015.09.30
  • Published : 2015.10.25

Abstract

The paper proposes a prototype selection method and evaluates the generalization performance of standard algorithms and prototype based classification learning. The proposed prototype classifier defines multidimensional spheres with variable radii within class areas and generates a small set of training data. The nearest-neighbor classifier uses the new training set for predicting the class of test data. By decomposing bias and variance of the mean expected error value, we compare the generalization errors of k-nearest neighbor, Bayesian classifier, prototype selection using fixed radius and the proposed prototype selection method. In experiments, the bias-variance changing trends of the proposed prototype classifier are similar to those of nearest neighbor classifiers with all training data and the prototype selection rates are under 27.0% on average.

이 논문은 프로토타입 선택 방법을 제안하고, 편의-분산 분해를 이용하여 최근접 이웃 알고리즘과 프로토타입 기반 분류 학습의 일반화 성능 비교 평가에 있다. 제안하는 프로토타입 분류기는 클래스 영역 내에서 가변 반지름을 이용한 다차원 구를 정의하고, 적은 수의 프로토타입으로 구성된 새로운 훈련 데이터 집합을 생성한다. 최근접 이웃 분류기는 새 훈련 집합을 이용하여 테스트 데이터의 클래스를 예측한다. 평균 기대 오류의 편의와 분산 요소를 분해하여 최근접 이웃 규칙, 베이지안 분류기, 고정 반지름을 이용한 프로토타입 선택 방법, 제안하는 프로토타입 선택 방법의 일반화 성능을 비교한다. 실험에서 제안하는 프로토타입 분류기의 편의-분산 변화 추세는 모든 훈련 데이터를 사용하는 최근접 이웃 알고리즘과 비슷한 편의-분산 추세를 보였으며, 프로토타입 선택 비율은 전체 데이터의 평균 약 27.0% 이하로 나타났다.

Keywords

References

  1. X. Wu et al., "The top ten algorithms in data mining," CRC Press, 2009.
  2. T. Hastie, R. Tibshirani, and J. Friedman, "The Elements of Statistical Learning: Data Mining," Inference, and Prediction, Springer Series in Statistics, 2001.
  3. J. Arturo Olvera-Lopez, J. Ariel Carrasco-Ochoa, J. Francisco Martinez Trinidad, and J. Kittler, "A review of instance selection methods," Artif. Intell. Rev Vol. 34, No. 2, pp. 133-143, Aug. 2010. https://doi.org/10.1007/s10462-010-9165-y
  4. P. Flach, "Machine Learning, The Art and Science of Algorithms that Make Sense of Data," Cambridge University Press, 2012.
  5. R. Kohavi, D. H. Wolpert, "Bias Plus Variance Decomposition for Zero-One Loss Functions," In Proceedings of the Thirteenth International Conference on Machine Learning, 275-283, 1996.
  6. P. Domingos, "A United Bias-Variance Decomposition for Zero-One and Squared Loss," In Proceedings of the Seventeenth National Conference on Artificial Intelligence, 231-238, 2000.
  7. J. Bien and R. Tibshirani, "Prototype selection for interpretable classification," The Annuals of Applied Statistics Vol. 5, No. 4, pp. 2403-2424, Dec, 2011. https://doi.org/10.1214/11-AOAS495
  8. D. S. Hwang, "Performance Improvement of Nearest-neighbor Classification Learning through Prototype Selection," Journal of The Institute of Electronics Engineers of Korea, Vol. 49(2)-CI, pp. 53-60, Mar. 2012.
  9. F. Angiulli, "Fast Nearest Neighbor Condensation for Large Data Sets Classification," IEEE Transactions on Knowledge and Data Engineering, Vol. 19, No. 11, pp. 1450-1464, Nov. 2007. https://doi.org/10.1109/TKDE.2007.190645
  10. D. Marchette, "Class cover catch digraphs," Wiley Interdisciplinary Reviews : Computational Statistics Vol. 2, No. 2, pp. 171-177, Mar. 2010. https://doi.org/10.1002/wics.70
  11. R. Younsi, and A. Bagnall, "An efficient randomised sphere cover classifier," Int. J. of Data Mining, Modelling and Management, Vol. 4, No. 2, pp.156-171, Jan. 2012. https://doi.org/10.1504/IJDMMM.2012.046808
  12. S. W. Kim, "Relational Discriminant Analysis Using Prototype Reduction Schemes and Mahalanobis Distances," Journal of The Institute of Electronics Engineers of Korea, Vol. 43(1)-CI, pp. 9-16, Jan. 2006.
  13. Dietterich, T. G and Kong, E. B., "Machine learning bias, statistical bias, and statistical variance of decision tree algorithms," Technical report, Department of Computer Science, Oregon State University, 1995.
  14. UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/
  15. The DELVE Manual, http://www.cs.utoronto.ca/-deve/
  16. Stalog project, http://www1.maths.leed.ac.uk/-charles/statlog/ indexdos.html