DOI QR코드

DOI QR Code

Feature Selection Method by Information Theory and Particle S warm Optimization

상호정보량과 Binary Particle Swarm Optimization을 이용한 속성선택 기법

  • 조재훈 (충북대학교 전기전자컴퓨터공학부) ;
  • 이대종 (충북대학교 전기전자컴퓨터공학부) ;
  • 송창규 (충북대학교 BK21 충북정보기술사업단) ;
  • 전명근 (충북대학교 전기전자컴퓨터공학부)
  • Published : 2009.04.25

Abstract

In this paper, we proposed a feature selection method using Binary Particle Swarm Optimization(BPSO) and Mutual information. This proposed method consists of the feature selection part for selecting candidate feature subset by mutual information and the optimal feature selection part for choosing optimal feature subset by BPSO in the candidate feature subsets. In the candidate feature selection part, we computed the mutual information of all features, respectively and selected a candidate feature subset by the ranking of mutual information. In the optimal feature selection part, optimal feature subset can be found by BPSO in the candidate feature subset. In the BPSO process, we used multi-object function to optimize both accuracy of classifier and selected feature subset size. DNA expression dataset are used for estimating the performance of the proposed method. Experimental results show that this method can achieve better performance for pattern recognition problems than conventional ones.

본 논문에서는 BPSO(Binary Particle Swarm Optimization)방법과 상호정보량을 이용한 속성선택기법을 제안한다. 제안된 방법은 상호정보량을 이용한 후보속성부분집합을 선택하는 단계와 BPSO를 이용한 최적의 속성부분집합을 선택하는 단계로 구성되어 있다. 후보속성부분집합 선택 단계에서는 독립적으로 속성들의 상호정보량을 평가하여 순위별로 설정된 수 만큼 후보속성들을 선택한다. 최적속성부분집합 선택 단계에서는 BPSO를 이용하여 후보속성부분집합에서 최적의 속성부분집합을 탐색한다. BPSO의 목적함수는 분류기의 정확도와 선택된 속성 수를 포함하는 다중목적함수(Multi-Object Function)을 이용하였다. 제안된 기법의 성능을 평가하기 위하여 유전자 데이터를 사용하였으며, 실험결과 기존의 방법들에 비해 우수한 성능을 보임을 알 수 있었다.

Keywords

References

  1. H. Liu, L. Yu, 'Toward Integrating Feature Selection Algorithms for Classification and Clustering,' IEEE Trans. Knowledge and Data Engineering., vol, 17, No.4, pp. 491-502, 2005 https://doi.org/10.1109/TKDE.2005.66
  2. H. Almuallim and T.G. Dietterich, 'Learning with Many Irrelevant Features,' Proc. Ninth Nat'l conf. Artificial Intelligence, vol.69, no.1-2, pp. 279-305, 1994
  3. H. Peng, F. Long and C. Ding, 'Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy', IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 27, no. 8, pp. 2005 https://doi.org/10.1109/TPAMI.2005.15910.1109/TPAMI.2005.15910.1109/TPAMI.2005.15910.1109/TPAMI.2005.159
  4. M.A. Hall, 'Correlation-based Feature for Discrete and Numeric Class Learning,' Proc. 17th Int'l conf. Learning, pp. 359-366, 2000
  5. H. Liu and H. Motoda, Feature Selection for Knowledge Discovery and Data Mining. Boston : Kluwer Academic. 1998
  6. J. J Aguilera, M chica, M. J. del Jesus and F. Herrera, 'Niching genetic feature selection algorithms applied to the design of fuzzy rule-based classification systems', IEEE International conference on Fuzzy Systems Fuzz-IEEE2007, pp.1-6 2007
  7. R. Battit., 'Using mutual information for selecting features in supervised neural net learning', IEEE Trans. Neural Networks, vol. 5, no. 4, pp. 1994
  8. J. Kennedy, R.C. Eberhart, 'Particle Swarm Opimization' IEEE Int'l conf. Neural Networks, vol. 4, pp.1942-1948, 1995 https://doi.org/10.1109/ICNN.1995.488968
  9. J. Kennedy, R. Eberhart, 'A discrete binary version of the particle swarm algorith.', IEEE internal Conf. Computational Cybernetics and Simulation, vol. 5, pp. 4104-4108, 1997 https://doi.org/10.1109/ICSMC.1997.637339
  10. L.Y. Chuang, H.W. Chang, C.J. Tu, C.H. Yang, 'Improved Binary PSO for feature selection using gene expression data' computational Biology and Chemistry, vol. 32, no.1, pp. 29-38, 2008 https://doi.org/10.1016/j.compbiolchem.2007.09.005
  11. F. Tan, X. Fu, Y. Zhang and Anu G. Bourgeois, 'Improving Feature Subset Selection Using a Genetic Algorithm for Microarray Gene Expression Data', IEEE Congress on Evolutionary pp. 2529-2534, 2006