Binary classification by the combination of Adaboost and feature extraction methods

특징 추출 알고리즘과 Adaboost를 이용한 이진분류기

  • Ham, Seaung-Lok (Department of Electrical Engineering, Ajou University) ;
  • Kwak, No-Jun (Department of Electrical Engineering, Ajou University)
  • 함승록 (아주대학교 전자공학과) ;
  • 곽노준 (아주대학교 전자공학과)
  • Received : 2011.12.15
  • Accepted : 2012.07.04
  • Published : 2012.07.25

Abstract

In pattern recognition and machine learning society, classification has been a classical problem and the most widely researched area. Adaptive boosting also known as Adaboost has been successfully applied to binary classification problems. It is a kind of boosting algorithm capable of constructing a strong classifier through a weighted combination of weak classifiers. On the other hand, the PCA and LDA algorithms are the most popular linear feature extraction methods used mainly for dimensionality reduction. In this paper, the combination of Adaboost and feature extraction methods is proposed for efficient classification of two class data. Conventionally, in classification problems, the roles of feature extraction and classification have been distinct, i.e., a feature extraction method and a classifier are applied sequentially to classify input variable into several categories. In this paper, these two steps are combined into one resulting in a good classification performance. More specifically, each projection vector is treated as a weak classifier in Adaboost algorithm to constitute a strong classifier for binary classification problems. The proposed algorithm is applied to UCI dataset and FRGC dataset and showed better recognition rates than sequential application of feature extraction and classification methods.

패턴 인식과 기계 학습 분야에서 분류는 가장 기본적으로 해결해야 하는 문제의 유형이다. Adaboost 알고리즘은 Boosting 알고리즘의 아이디어를 실제 데이터분석에 이용할 수 있도록 개량한 방법으로써, 단계를 반복하여 나온 여러 개의 약한 분류기와 가중치 값들의 조합으로 강한 분류기를 생성하는 두 개의 클래스를 분류하는 분류기이다. 주성분 분석법과 선형 판별 분석법은 높은 차원의 특징 벡터를 낮은 차원의 특징 벡터로 축소하는 특징 벡터의 차원 감소와 데이터의 특징 추출에도 유용하게 사용되는 방법들이다. 본 논문에서는, 주성분 분석법과 선형 판별 분석법을 이용하여 추출한 특징을 Adaboost 알고리즘의 약 분류기로 사용함으로써, 특징 추출과 분류를 동시에 하고, 인식률을 높이는 효율적인 Boosted-PCA와 Boosted-LDA 알고리즘을 제안한다. 마지막 장에서는, 제안하는 알고리즘으로 UCI Data-Set 중 2 Class-Data와 FRGC Data의 남자와 여자 영상에 대해서 분류 실험을 진행하였다. 실험의 결과로 제안한 Boosted-PCA와 Boosted-LDA 알고리즘이 기존의 특징 추출 알고리즘과 최근접 이웃 분류기, SVM을 이용한 분류기 방법과 비교하여 인식률이 향상됨을 보인다.

Keywords

References

  1. Christopher J. C. Burges, "A Tutorial on Support Vector Machines for Pattern Recognition", DataMining and Knowledge Discovery ,Vol. 2 , pp. 121-167, 1998. https://doi.org/10.1023/A:1009715923555
  2. Cover T. M. and Hart P. E, "Nearest Neighbor Pattern Classification", IEEE Transactions on Information Theory, Vol .IT-13, no. 1, pp. 21-27, 1967.
  3. Simon Haykin, "Neural networks", 2nd Edition, PrenticeHall, 1999.
  4. Y. Freund, R. E. Schapire, "A Short Introduction to Boosting", Journal of Japanese Society for Artificial Intelligence, Vol. 14, no. 5, pp. 771-780, 1999.
  5. Yoav Freund, Robert E. Schapire, "A Decision- Theoretic Generalization of on-Line Learning and an Application to Boosting," In European Conference on Computational Learning Theory, pp. 23-37, 1995.
  6. Robert. E. Schapire and Yoram Singer, "Improved boosting algorithms using confidencerated predictions," Machine Learning, Vol. 37, no. 3, pp. 297-336, 1999. https://doi.org/10.1023/A:1007614523901
  7. P. viola and M. J. Jones, "Robust Real-time Face Detection", International Journal of Computer Vision, Vol. 57, No. 2, pp. 137-154, 2004.
  8. Matthew Turk and Alex Pentland, "Eigenface for Recognition," Journal of Cognitive Neuroscience Vol. 3, no. 1, pp.70-86, 1991.
  9. I.T.Joliffe, "Principal Component Analysis," Springer-Verlag, 1986.
  10. Friedman, J. H. "Regularized Discriminant Analysis," Journal of the American Statistical Association (American Statistical Association) 84(405), pp. 165-175, 1989. https://doi.org/10.1080/01621459.1989.10478752
  11. Martinez, A. M.; Kak, A. C. "PCA versus LDA," IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 23, no. 2, pp. 228-233, 2001. https://doi.org/10.1109/34.908974
  12. 함승록, 곽노준, "Boosted-PCA를 이용한 이진분류기", 신호처리합동학술대회 논문집, Vol. 24, no. 1, pp. 195-197, 2011
  13. UCI Data Sets, Availabe: http://archive.ics.uci.edu/ml/datasets.html.
  14. P. M. Murphy and D. W. Aha, "UCI repository of machine learning databases," 1994, For more information contact or http://www.cs.toronto.edu/_delve/.
  15. P. Jonathon Phillips, Harry Weschsler, Jeffery Huang, and Patrick J.Rauss, "The FERET database and evaluation procedure for face-recognition algorithms", Imageand Vision Computing, Vol. 16, no. 5, pp. 295-306, 1998. https://doi.org/10.1016/S0262-8856(97)00070-X
  16. P. J. Phillips et. al., "Overview of the Face Recognition Grand Challenge," IEEE Conference on Computer Vision and Pattern Recognition 2005.