DOI QR코드

DOI QR Code

Hyper-Rectangle Based Prototype Selection Algorithm Preserving Class Regions

클래스 영역을 보존하는 초월 사각형에 의한 프로토타입 선택 알고리즘

  • 백병현 (단국대학교 소프트웨어학과) ;
  • 어성율 (단국대학교 소프트웨어학과) ;
  • 황두성 (단국대학교 소프트웨어학과)
  • Received : 2019.08.20
  • Accepted : 2019.11.27
  • Published : 2020.03.31

Abstract

Prototype selection offers the advantage of ensuring low learning time and storage space by selecting the minimum data representative of in-class partitions from the training data. This paper designs a new training data generation method using hyper-rectangles that can be applied to general classification algorithms. Hyper-rectangular regions do not contain different class data and divide the same class space. The median value of the data within a hyper-rectangle is selected as a prototype to form new training data, and the size of the hyper-rectangle is adjusted to reflect the data distribution in the class area. A set cover optimization algorithm is proposed to select the minimum prototype set that represents the whole training data. The proposed method reduces the time complexity that requires the polynomial time of the set cover optimization algorithm by using the greedy algorithm and the distance equation without multiplication. In experimented comparison with hyper-sphere prototype selections, the proposed method is superior in terms of prototype rate and generalization performance.

프로토타입 선택은 훈련 데이터로부터 클래스 영역을 대표하는 최소 데이터를 선택하여 낮은 학습 시간 및 저장 공간을 보장하는 장점을 제공한다. 본 논문은 모든 분류 알고리즘에 적용할 수 있는 초월 사각형을 이용한 새로운 훈련 데이터의 생성 방법을 설계한다. 초월 사각형 영역은 서로 다른 클래스 데이터를 포함하지 않으며 클래스 공간을 분할한다. 선택된 초월 사각형 내 데이터의 중간값은 프로토타입이 되어 새로운 훈련 데이터를 구성하고, 초월 사각형의 크기는 클래스 영역의 데이터 분포를 반영하여 조절된다. 전체 훈련 데이터를 대표하는 최소의 프로토타입 집합 선택을 위해 집합 덮개 최적화 알고리즘을 설계했다. 제안하는 방법에서는 탐욕 알고리즘과 곱셈 연산을 포함하지 않은 거리 계산식을 이용하여 집합 덮개 최적화 알고리즘의 다항 시간을 요구하는 시간 복잡도 문제를 해결한다. 실험에서는 분류 성능의 비교를 위해 최근접 이웃 규칙과 의사 결정 트리 알고리즘을 이용하며 제안하는 방법이 초월 구를 이용한 프로토타입 선택 방법보다 우수하다.

Keywords

References

  1. N. Bhatia, Vandana, "Survey of Nearest Neighbor Techniques," International Journal of Computer Science and Information Security, Vol.8, No.2, 2010.
  2. I. Triguero, J. Derrac, S. Garcia, and F. Herrea, "A Taxonomy and Experimental Study on Prototype Generation for Nearest Neighbor Classification," IEEE Transactions on Systems, Man, and Cybernetics Part C(Application And Reviews), Vol.42, No.1, pp.86-100, 2012. https://doi.org/10.1109/TSMCC.2010.2103939
  3. R. M. Cruz, R. Sabourin, and G. D. Cavalcanti, "Prototype selection for dynamic classifier and ensemble selection," Neural Computing and Applications, Vol.29, pp.447-457, 2016. https://doi.org/10.1007/s00521-016-2458-6
  4. R. M. Curz, R. Sabourin, and G. D. Cavalcanti, "Analyzing different prototype selection techniques for dynamic classifier and ensemble selection," International Joint Conference on Neural Networks, pp.3959-3966, 2017.
  5. E. Pekalska, R. P. W. Duin, and P. Paclik, "Prototype selection for dissimilarity-based classifier," Pattern Recognition, 39, pp.189-208, 2006. https://doi.org/10.1016/j.patcog.2005.06.012
  6. J. A. Olvera-Lopez, J. A. Carrasco-Ochoa, J. F. Martinez Trinidad, and J. Kittler, "A review of instance selection methods," Artif Intell Rev, Vol.34, No.2, pp.133-143, 2010. https://doi.org/10.1007/s10462-010-9165-y
  7. D. R. Wilson and T. R. Martinez, "Reduction techniques for instance-based learning algorithms," Machine Learning, Vol.38, No.3, pp.257-286, 2000. https://doi.org/10.1023/A:1007626913721
  8. S. Garcia, J. Derrac, J. Cano, and F. Herrera, "Prototype selection for nearest neighbor classification: taxonomy and empirical study," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.34, No.3, pp.417-435, 2012. https://doi.org/10.1109/TPAMI.2011.142
  9. S. Choi, S. Cha, and C. Tappert, "A Survey of Binary Similarity and Distance Measures," J. Systemics, Cybernetics and Informatics, Vol.8, No.1, pp.43-48, 2010.
  10. J. Bien and R. Tibshirani, "Prototype selection for interpretable classification," The Annals of Applied Statistics, Vol.5, No.4, pp.2403-2424, 2011. https://doi.org/10.1214/11-AOAS495
  11. D. Marchette, "Class cover catch digraphs," Wiley Interdisciplinary Reviews: Computational Statistics, Vol.2, No.2, pp.171-177, 2010. https://doi.org/10.1002/wics.70
  12. R. Younsi and A. Bagnall, "A randomized sphere cover classifier," International Conference on Intelligent Data Engineering and Automated Learning, pp.234-241, 2010.
  13. S. Seyong and H. Doosung, "Prototype based Classification by Generating Multidimensional Spheres per Class Area," Journal of The Korea Society of Computer and Information, Vol.20, No.2, 2015.
  14. S. Arora, D. Karger, and M. Karpinski, "Polynomial time approximation schemes for dense instances of NP-hard problems," Journal of Computer and System Sciences, Vol.58, pp.193-210, 1999. https://doi.org/10.1006/jcss.1998.1605
  15. D. S. Hwang and D. W. Kim, "Near-boundary data selection ofor fast support vector machines," Malasian Journal of Computer Science, Vol.25, No.1, pp.23-37, 2012.
  16. F. Angiulli, "Fast Nearest Neighbor Condensation for Large Data Sets Classification," IEEE Transactions on Knowledge and Data Engineering, Vol.19, No.11, pp.1450-1464, 2007. https://doi.org/10.1109/TKDE.2007.190645
  17. A. H. Cannon and L. J. Cowen, "Approximation algorithms for the class cover problem," Annals of Mathematics and Artificial Intelligence, Vol.40, No.3-4, pp.215-223, 2004. https://doi.org/10.1023/B:AMAI.0000012867.03976.a5
  18. UCI Machine Learning Repository [Online]. Available: https://archive.ics.uci.edu/ml/.