DOI QR코드

DOI QR Code

Hybrid Feature Selection Using Genetic Algorithm and Information Theory

  • Cho, Jae Hoon (Smart Logistics Technology Institute, Hankyong National University) ;
  • Lee, Dae-Jong (Department of Electrical & Computer Engineering, Chungbuk National University) ;
  • Park, Jin-Il (Smart Logistics Technology Institute, Hankyong National University) ;
  • Chun, Myung-Geun (Department of Electrical & Computer Engineering, Chungbuk National University)
  • Received : 2013.02.18
  • Accepted : 2013.03.15
  • Published : 2013.03.25

Abstract

In pattern classification, feature selection is an important factor in the performance of classifiers. In particular, when classifying a large number of features or variables, the accuracy and computational time of the classifier can be improved by using the relevant feature subset to remove the irrelevant, redundant, or noisy data. The proposed method consists of two parts: a wrapper part with an improved genetic algorithm(GA) using a new reproduction method and a filter part using mutual information. We also considered feature selection methods based on mutual information(MI) to improve computational complexity. Experimental results show that this method can achieve better performance in pattern recognition problems than other conventional solutions.

Keywords

References

  1. M. Dash and H. Liu, "Feature selection for classification," Intelligent Data Analysis, vol. 1, no. 3, pp. 131-156, 1997. http://dx.doi.org/10.1016/S1088-467X(97)00008-5
  2. M. Rais, J. Barrera, and D. C. Martins Jr, "U-curve: a branch-and-bound optimization algorithm for U-shaped cost functions on Boolean lattices applied to the feature selection problem," Pattern Recognition, vol. 43, no. 3, pp. 557-568, Mar. 2010. http://dx.doi.org/10.1016/j.patcog.2009.08.018
  3. S. Foithong, O. Pinngern, and B. Attachoo, "Feature subset selection wrapper based on mutual information and rough sets," Expert Systems with Applications, vol. 39, no. 1, pp. 574-584, Jan. 2012. http://dx.doi.org/10.1016/j.eswa.2011.07.048
  4. N. R. Pal and M. Malpani, "Redundancy-constrained feature selection with radial basis function networks," in Proceedings of 2012 IEEE International Joint Conference on Neural Networks (IJCNN), Brisbane, 2012, pp. 1-8. http://dx.doi.org/10.1109/IJCNN.2012.6252638
  5. T. Zhang, "On the consistency of feature selection using greedy least squares regression," Journal of Machine Learning Research, vol. 10, no. Mar, pp. 555-568, Mar. 2009.
  6. R. Battiti, "Using mutual information for selecting features in supervised neural net learning," IEEE Transactions on Neural Networks, vol. 5, no. 4, pp. 537-550, Jul. 1994. http://dx.doi.org/10.1109/72.298224
  7. N. Kwak and C. H. Choi, "Input feature selection for classification problems," IEEE Transactions on Neural Networks, vol. 13, no. 1, pp. 143-159, Jan. 2002. http://dx.doi.org/10.1109/72.977291
  8. K. Z. Mao, "Feature subset selection for support vector machines through discriminative function pruning analysis," IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 34, no. 1, pp. 60-67, Feb. 2004. http://dx.doi.org/10.1109/TSMCB.2002.805808
  9. C. N. Hsu, H. J. Huang, and S. Dietrich, "The ANNIGMA-wrapper approach to fast feature selection for neural nets," IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 32, no. 2, Apr. 2002. http://dx.doi.org/10.1109/3477.990877
  10. N. R. Pal and K. Chintalapudi, "A connectionist system for feature selection," Neural, Parallel and Scientific Computations, vol. 5, no. 3, pp. 359-381, Sep. 1997.
  11. W. Siedlecki and J. Sklansky, "A note on genetic algorithms for large-scale feature selection," Pattern Recognition Letters, vol. 10, no. 5, pp. 335-347, Nov. 1989. http://dx.doi.org/10.1016/0167-8655(89)90037-8
  12. N. R. Pal, S. Nandi, and M. K. Kundu, "Self-crossover: a new genetic operator and its application to feature selection," International Journal of Systems Science, vol. 29, no. 2, pp. 207-212, May. 1998. http://dx.doi.org/10.1080/00207729808929513
  13. M. Kudo and J. Sklansky, "Comparison of algorithms that select features for pattern classifiers," Pattern Recognition, vol. 33, no. 1, pp. 25-41, Jan. 2000. http://dx.doi.org/10.1016/S0031-3203(99)00041-2
  14. F. Tan, X. Z. Fu, Y. Q. Zhang, and A. G. Bourgeois, "Improving feature subset selection using a genetic algorithm for microarray gene expression data," in Proceedings of 2006 IEEE Congress on Evolutionary Computation, Vancouver, 2006, pp. 2529-2534. http://dx.doi.org/10.1109/CEC.2006.1688623
  15. C. E. Shannon and W. Weaver, The Mathematical Theory of Communication, Urbana: University of Illinois Press, 1949.