Browse > Article

A Decision Tree Induction using Genetic Programming with Sequentially Selected Features  

Kim Hyo-Jung (성균관대학교 경제학부)
Park Chong-Sun (성균관대학교 경제학부)
Publication Information
Korean Management Science Review / v.23, no.1, 2006 , pp. 63-74 More about this Journal
Abstract
Decision tree induction algorithm is one of the most widely used methods in classification problems. However, they could be trapped into a local minimum and have no reasonable means to escape from it if tree algorithm uses top-down search algorithm. Further, if irrelevant or redundant features are included in the data set, tree algorithms produces trees that are less accurate than those from the data set with only relevant features. We propose a hybrid algorithm to generate decision tree that uses genetic programming with sequentially selected features. Correlation-based Feature Selection (CFS) method is adopted to find relevant features which are fed to genetic programming sequentially to find optimal trees at each iteration. The new proposed algorithm produce simpler and more understandable decision trees as compared with other decision trees and it is also effective in producing similar or better trees with relatively smaller set of features in the view of cross-validation accuracy.
Keywords
Decision Tree; Correlation based Feature Selection; Genetic Programing;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Bot M.C.J. and W.B. Longdon, 'Application of Genetic Programming to Induction of Linear Classification Trees,' European Conference on Genetic Programming EuroGP2000, Lecture Notes in Computer Science 1802, (2000), pp.247-258
2 Cherkauer, K.J. and J.W. Shavilik, 'Growing Simpler Decision Trees to Facilitate Knowledge Discovery,' Machine Learning: In Proceedings of the second International Conference on Knowledge and Data Mining, AAAI press, San Mateo, (1996), pp. 315-318
3 Hall, M., 'Correlation-based Feature Selection of Discrete and Numeric Class Machine Learning,' In Proceedings of the International Conference on Machine Learning, Morgan Kaufmann, San Francisco, (2000), pp.359-366
4 Kononenko, I., 'Estimating Attributes:Analysis and Extension of Relief,' In Proceedings of the European Conference on Machine Learning, (1994), pp.171-182
5 Koza, J. R., Genetic Programming, MIT press, 1992
6 Murthy, S.K., 'Automatic Construction of Decision Trees from Data:A Multidisciplinary Survey,' In Data Mining and Knowledge Discovery, No.2(1998), pp.345-389
7 Papagelis, A. and D. Kalles, 'Breeding Decision Trees using Evolutionary Techniques,' ICML, (2001), pp.393-400
8 http://www.cs.waikato.ac.nz/ml/weka
9 Setiono, R. and H. Liu, 'Chi2:Feature Selection and Discretization of Numeric Attributes,' In Proceedings of the Seventh IEEE International Conference on Tools with Artificial Intelligence, (1995), pp.388-391
10 Pfahringer, B., 'Compression-based Feature Subset Selection,' In Proceedings of the IHCAI-95 Workshop on Data Engineering for Inductive Learning, (1995), pp.109-119
11 http://www.cs.ucl.ac.uk/external/A.Qureshi/gpsys_doc.html
12 Soule, T., 'Code Growth in Genetic Programming,' PhD thesis, University of Idaho, Moscow, Idaho, USA, 1998
13 Lee, S. and M.Y. Huh, 'A Measure of Association for Complex Data,' Computational Statistics and Data Analysis, Vol.44, No.1-2(2003), pp.211-222   DOI   ScienceOn
14 Caruana, R. and D. Freitag, 'Greedy Attribute Selection,' In Machine Learning: Proceedings of the Eleventh International Conference, Morgan Kaufmann, (1994), pp. 28-36
15 Fu, Z., 'A Computational Study of using Genetic Algorithms to Develop Intelligent Decision Trees,' Proceedings of the 2001 Congress on Evolutionary Computation, Seoul, South Korea, (2001), pp.1382-1387
16 Koza, J.R., 'Concept Formation and Decision tree Induction using the Genetic Programming Paradigm,' Parallel Problem Solving from Nature, Berlin:Springer-Verlag, (1991), pp.124-128
17 Kononenko, I. and E. Simec, 'Induction of Decision Trees using RELIEFF,' In: Kruse, R., Viertl, R., Riccia, G. Della (eds.), CISM Lecture Notes, Springer Verlag, (1994), pp.199-220
18 Alumuallim, H. and T.G. Ditterich, 'Learning with many Irrelevant Features,' In Proceedings of Ninth National Conference on Artificial Intelligence, MIT Press, (1991), pp.542-547
19 Vafail, H. and K. De Jong, 'Genetic Algorithms as a Tool for Restructuring Feature Space Representations,' In Proceedings of the International Conference on Tools With A. I., IEEE Computer Society Press, 1995
20 Holmes, G. and C.G. Nevill-Manning, 'Feature Selection Via the Discovery of Simple Classification Rules,' In Proceedings of th Symposium on Intelligent Data Analysis, Baden-Baden, Germany, August, 1995
21 http://www.ics.uci.edu/~mlean/MLRepository.html
22 Breiman, L., J.H. Friedman, R.A. Olshen, and C.J. Stone, Classification and Regression Trees, Chapman & Hall/CRC, 1998
23 Kohavi, R. and G. John, 'Wrapper for Feature Subset Selection,' In Artificial Intelligence, Vol.97, No.1-2(1998), pp.273-324
24 Witten, I.H. and F. Eibe, Data Mining, Morgan and Kaufmann, 1990
25 Koller, D. and M. Sahami, 'Hierarchically Classifying Documents using very Few Words,' In Machine Learning:Proceedings of the Fourteenth International Conference, Morgan Kaufmann, (1997), pp.170-178
26 http://www.r-project.org
27 Aha, D.W. and R.L. Bankert, 'A Comparative Evaluation of Sequential Feature Selection Algorithms,' In Proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics, Ft. Lauderdale, 1995, pp.1-7
28 Quinlan, J.R., C4.5:Programs for Machine Learning, San Mateo, CA:Morgan Kaufmann, 1993
29 John, G.H., R. Kohavi, and P. Pfleger, 'Irrelevant Features and Subset Selection Problem,' In Machine Learning:Proceedings of the Eleventh International Conference, Morgan Kaufmann, (1994), pp.121-129