군집화와 유전 알고리즘을 이용한 거친-섬세한 분류기 앙상블 선택

Coarse-to-fine Classifier Ensemble Selection using Clustering and Genetic Algorithms

  • 김영원 (한국전자통신연구원 우정기술연구센터 자동구분처리연구팀) ;
  • 오일석 (전북대학교 전자정보공학부)
  • 발행 : 2007.09.15

초록

좋은 분류기 앙상블은 분류기간에 상호 보완성을 갖추어 높은 인식 성능을 보여야 하며, 크기가 작아 계산 효율이 좋아야 한다. 이 논문은 이러한 목적을 달성하기 위한 거친-섬세한 (coarse-to-fine)단계를 밟는 분류기 앙상블 선택 방법을 제안한다. 이 방법이 성공하기 위해서는 초기 분류기 풀 (pool)이 충분히 다양해야 한다. 이 논문에서는 여러 개의 서로 다른 분류 알고리즘과 아주 많은 수의 특징 부분집합을 결합하여 충분히 큰 분류기 풀을 생성한다. 거친 선택 단계에서는 분류기 풀의 크기를 적절하게 줄이는 것이 목적이다. 분류기 군집화 알고리즘을 사용하여 다양성을 최소로 희생하는 조건하에 분류기 풀의 크기를 줄인다. 섬세한 선택에서는 유전 알고리즘을 이용하여 최적의 앙상블을 찾는다. 또한 탐색 성능이 개선된 혼합 유전 알고리즘을 제안한다. 널리 사용되는 필기 숫자 데이타베이스를 이용하여 기존의 단일 단계 방법과 제안한 두 단계 방법의 성능을 비교한 결과 제안한 알고리즘이 우수함을 입증하였다.

The good classifier ensemble should have a high complementarity among classifiers in order to produce a high recognition rate and its size is small in order to be efficient. This paper proposes a classifier ensemble selection algorithm with coarse-to-fine stages. for the algorithm to be successful, the original classifier pool should be sufficiently diverse. This paper produces a large classifier pool by combining several different classification algorithms and lots of feature subsets. The aim of the coarse selection is to reduce the size of classifier pool with little sacrifice of recognition performance. The fine selection finds near-optimal ensemble using genetic algorithms. A hybrid genetic algorithm with improved searching capability is also proposed. The experimentation uses the worldwide handwritten numeral databases. The results showed that the proposed algorithm is superior to the conventional ones.

키워드

참고문헌

  1. R.E. Banfield, L.O. Hall, K.W. Bowyer, and W.P. Kegelmeyer, 'A comparison of decision tree ensemble creation method,' IEEE Tr. Pattern Analysis and Machine Intelligence, vol.29, no.1, pp.173-180, January 2007 https://doi.org/10.1109/TPAMI.2007.250609
  2. G. Fumera and F. Roli, 'A theoretical and experimental analysis of linear combiners for multiple classifier systems,' IEEE Tr. Pattern Analysis and Machine Intelligence, vol.27, no.6, pp.942-956, 2005 https://doi.org/10.1109/TPAMI.2005.109
  3. G. Giacinto and F Roli, 'An approach to the automatic design of multiple classifier systems,' Pattern Recognition Letters, vol.22, pp.25-33, 2001 https://doi.org/10.1016/S0167-8655(00)00096-9
  4. P.M. Granitto, P.F. Verdes, and H.A. Ceccatto, 'Neural network ensembles: evaluation of aggregation algorithms,' Artificial Intelligence, vol.163, no.2, pp.139-162, 2005 https://doi.org/10.1016/j.artint.2004.09.006
  5. H. Hao, C.-L. Liu, and H. Sako, 'Comparison of genetic algorithm and sequential search methods for classifier subset selection,' Proceedings of ICDAR, 2003
  6. Tin Kam Ho, 'Multiple classifier combination: lessons and next steps,' in Hybrid Methods in Pattern Recognition, (Ed. by H. Bubke & A. Kandel), pp.171-198, World Scientific, 2002
  7. J. Kittler and F.M. Alkoot, 'Sum versus vote fusion in multiple classifier systems,' IEEE Tr. Pattern Analysis and Machine Intelligence, vol.25, pp.110-115, 2003 https://doi.org/10.1109/TPAMI.2003.1159950
  8. C.-L. Liu, K. Nakashima, H. Sako, and H. Fujisawa, 'Handwritten digit recognition: benchmarking of state-of-the-art techniques,' Pattern Recognition, Vol.36, pp.2271-2285, 2004 https://doi.org/10.1016/S0031-3203(03)00085-2
  9. Il-Seok Oh and Ching Y. Suen, 'Distance features for neural network-based recognition of handwritten characters,' International Journal on Document Analysis and Recognition, vol.1, pp.73-88, 1998 https://doi.org/10.1007/s100320050008
  10. Il-Seok Oh, Jin-Seon Lee, and Byung-Ro Moon, 'Hybrid genetic algorithms for feature selection,' IEEE Tr. Pattern Analysis and Machine Intelligence, vol.26, no.11, pp.1424-1437, 2004 https://doi.org/10.1109/TPAMI.2004.105
  11. D. Partridge and W. B. Yates, 'Engineering multiversion neural-net systems,' Neural Computation, vol.8, pp.869-893, 1996 https://doi.org/10.1162/neco.1996.8.4.869
  12. N. Garcia-Pedrajas, C. Hervas-Martinez, and D. Ortiz-Boyer, 'Cooperative coevolution of artificial neural network ensembles for pattern classification,' IEEE Tr. Evolutionary Computation, Vol.9, No.3, pp.271-302, June 2005 https://doi.org/10.1109/TEVC.2005.844158
  13. J.J. Rodriguez, L.I. Kuncheva, and C.J. Alonso, 'Rotation forest: a new classifier ensemble method,' IEEE Tr. Pattern Analysis and Machine Intelligence, vol.28, no.10, pp.1619-1630, October 2006 https://doi.org/10.1109/TPAMI.2006.211
  14. http://svmlight.joachims.org, 2007
  15. A.J.C. Sharkey, N.E. Sharkey, U. Gerecke, and G.O. Chandroth, 'The test and select approach to ensemble combination,' in Multiple Classifier Systems (Ed. by J. Kittler and F. Roli), Springer, 2000
  16. S.Y. Sohn and H.W. Shin, 'Experimental study for the comparison of classifier combination methods,' Pattern Recognition, Vol.40, pp.33-40, 2007 https://doi.org/10.1016/j.patcog.2006.06.027
  17. S. Theodoridis and K. Koutroumbas, Pattern Recognition, 3rd ed., Academic Press, 2006
  18. N. Ueda, 'Optimal linear combination of neural networks for improving classification performance,' IEEE Tr. Pattern Analysis and Machine Intelligence, vol.22, no.2, pp.207-215, 2000 https://doi.org/10.1109/34.825759
  19. N.M. Wanas, R.A. Dara, M.S. Kamel, 'Adaptive fusion and co-operative training for classifier ensembles,' Pattern Recognition, Vol.39, pp.1781-1794, 2006 https://doi.org/10.1016/j.patcog.2006.02.003
  20. Zhi-Hua Zhou, Jianxin Wu, and Wei Tang, 'Ensembling neural networks: many could be better than all,' Artificial Intelligence, vol.137, pp.239-263, 2002 https://doi.org/10.1016/S0004-3702(02)00190-X
  21. 문병로, 유전알고리즘, 두양사, 2003
  22. 이진선, 김영원, 오일석, '대용량 분류에서 SVM과 신경망의 성능 비교,' 정보처리 학회 논문지, vol.12-B, no.1, pp.25-30, 2005 https://doi.org/10.3745/KIPSTB.2005.12B.1.025