Browse > Article

Diversity based Ensemble Genetic Programming for Improving Classification Performance  

Hong Jin-Hyuk (연세대학교 컴퓨터과학과)
Cho Sung-Bae (연세대학교 컴퓨터과학과)
Abstract
Combining multiple classifiers has been actively exploited to improve classification performance. It is required to construct a pool of accurate and diverse base classifier for obtaining a good ensemble classifier. Conventionally ensemble learning techniques such as bagging and boosting have been used and the diversify of base classifiers for the training set has been estimated, but there are some limitations in classifying gene expression profiles since only a few training samples are available. This paper proposes an ensemble technique that analyzes the diversity of classification rules obtained by genetic programming. Genetic programming generates interpretable rules, and a sample is classified by combining the most diverse set of rules. We have applied the proposed method to cancer classification with gene expression profiles. Experiments on lymphoma cancer dataset, prostate cancer dataset and ovarian cancer dataset have illustrated the usefulness of the proposed method. h higher classification accuracy has been obtained with the proposed method than without considering diversity. It has been also confirmed that the diversity increases classification performance.
Keywords
genetic programming; ensemble; diversity; cancer classification;
Citations & Related Records
Times Cited By KSCI : 1  (Citation Analysis)
연도 인용수 순위
1 Y. Zhang and S. Bhattacharyya, 'Genetic programming in classifying large-scale data: An ensemble method,' Information Sciences, vol. 163, no. 1-3, pp. 85-101, 2004   DOI   ScienceOn
2 M. Brameier and W. Banzhaf, 'Evolving teams of predictors with linear genetic programming,' Genetic Programming and Evolvable Machines, vol. 2, no. 4, pp. 381-407, 2001   DOI   ScienceOn
3 G. Webb and Z. Zheng, 'Multistrategy ensemble learning: Reducing error by combining ensemble learning techniques,' IEEE Trans. Knowledge and Data Engineering, vol. 16, no. 8, pp. 980-991, 2004   DOI   ScienceOn
4 D. Optiz and R. Maclin, 'Popular ensemble methods: An empirical study,' J. of Artificial Intelligence Research, vol. 11, pp. 169-198, 1999
5 M. Islam, et al., 'A constructive algorithm for training cooperative neural network ensembles,' IEEE Trans. Neural Network, vol. 14, no. 4, pp. 820-834, 2003   DOI   ScienceOn
6 C. Shipp and L. Kuncheva, 'Relationships between combination methods and measures of diversity in combining classifiers,' Information Fusion, vol. 3, no. 2, pp. 135-148, 2002   DOI   ScienceOn
7 L. Kuncheva, 'A theoretical study on six classifier fusion strategies,' IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 2, pp. 281-286, 2002   DOI   ScienceOn
8 R. Bryll, et aI., 'Attribute bagging: Improving accuracy of classifier ensembles by using random feature subsets,' Pattern Recognition, vol. 36, no. 6, pp. 1291-1302, 2003   DOI   ScienceOn
9 W. Langdon and B. Buxton, 'Genetic programming for mining DNA chip data for cancer patients,' Genetic Programming and Evolvable Machines, vol. 5, no. 3, pp. 251-257, 2004   DOI
10 A. Tan and D. Gilbert, 'Ensemble machine learning on gene expression data for cancer classification,' Applied Bioinformatics, vol. 2, no. 3 Suppl., pp. S75-S83, 2003
11 J. Deutsch, 'Evolutionary algorithms for finding optimal gene sets in microarray prediction,' Bioinformatics, vol. 19, no. 1, pp. 45-52, 2003   DOI   ScienceOn
12 G. Valentini, 'Gene expression data analysis of human lymphoma using support vector machines and output coding ensembles,' Artificial Intelligence in Medicine, vol. 26, no. 3, pp. 281-304, 2002   DOI   ScienceOn
13 C. Park and S.-B. Cho, 'Evolutionary computation for optimal ensemble classifier in lymphoma cancer classification,' Lecture Notes in Artificial Intelligence, vol. 2871, pp. 521-530, 2003   DOI
14 L. Li, et aI., 'Gene selection for sample classification based on gene expression data: Study of sensitivity to choice of parameters of the GA/KNN method,' Bioinformatics, vol. 17, no. 12, pp. 1131-1142, 2001   DOI   ScienceOn
15 U. Schmidt and C. Begley, 'Cancer diagnosis and microarrays,' The Int. J. of Biochemistry & Cell Biology, vol. 35, no. 2, pp. 119-124, 2003   DOI   ScienceOn
16 M. Karzynski, et aI., 'Using a genetic algorithm and a perceptron for feature selection and supervised class learning in DNA microarray data,' Artificial Intelligence Review, vol. 20, no. 1-2, pp. 39-51, 2003   DOI
17 C. Ding and I. Dubchak, 'Multi-class protein fold recognition using support vector machines and neural networks,' Bioinformatics, vol. 17, no. 4, pp. 349-358, 2001   DOI   ScienceOn
18 N. Camp and M. Slattery, 'Classification tree analysis: A statistical tool to investigate risk factor interactions with an example for colon cancer,' Cancer Causes and Control, vol. 13, no. 9, pp. 813-823, 2002   DOI   ScienceOn
19 J. Khan, et aI., 'Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks,' Nature Medicine, vol. 7, no. 6, pp. 673-679, 2001   DOI   ScienceOn
20 I. Sarkar, et aI., 'Characteristic attributes in cancer microarrays,' J. of Biomedical Informatics, vol. 35, no. 2, pp. 111-122, 2002   DOI   ScienceOn
21 V. Roth and T. Lange, 'Bayesian class discovery in microarray datasets,' IEEE Trans. Biomedical Engineering, vol. 51, no. 5, pp. 707-718, 2004   DOI   ScienceOn
22 S. Tong and D. Koller, 'Support vector machine active learning with applications to text classification,' J. of Machine Learning Research, vol. 2, pp. 45-66, 2001   DOI
23 A. Alizadeh, et aI., 'Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling,' Nature, vol. 403, no. 6769, pp. 503-511, 2000   DOI   ScienceOn
24 G. Gordon, et aI., 'Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma,' Cancer Research, vol. 62, no. 17, pp. 4963-4967, 2002
25 E. Petricoin III, et aI., 'Use of proteomic patterns in serum to identify ovarian cancer,' The Lancet, vol. 359, no. 9306, pp. 572-577, 2002   DOI   ScienceOn
26 E. Bruke, et aI., 'Diversity in genetic programming: An analysis of measures and correlation with fitness,' IEEE Trans. Evolutionary Computation, vol. 8, no. 1, pp. 47-62, 2004   DOI   ScienceOn
27 L. Kuncheva, et aI., 'Decision templates for multiple classifier fusion: An experimental comparison,' Pattern Recognition, vol. 34, no. 2, pp. 299-314, 2001   DOI   ScienceOn
28 G. Zenobi and P. Cunningham, 'Using diversity in preparing ensembles of classifiers based on different feature subsets to minimize generalization error,' Lecture Notes in Computer Science, vol. 2167, pp. 576-587, 2001   DOI
29 L. Kuncheva and C. Whitaker, 'Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy,' Machine Learning, vol. 51, no. 2, pp. 181-207, 2003   DOI
30 T. Windeatt, 'Diversity measures for multiple classifier system analysis and design,' Information Fusion, 2004
31 J. Koza, 'Genetic programming,' Encyclopedia of Computer Science and Technology, vol. 39, pp. 29-43, 1999
32 F. Fernaandez, et aI., 'An empirical study of multipopulation genetic programming,' Genetic Programming and Evolvable Machines, vol. 4, no. 1, pp. 21-51, 2003   DOI   ScienceOn
33 K. Imamura, et aI., 'Behavioral diversity and a probabilistically optimal GP ensemble,' Genetic Programming and Evolvable Machines, vol. 4, no. 3, pp. 235-253, 2003   DOI   ScienceOn
34 J.-H. Hong and S.-B. Cho, 'Rule discovery for cancer classification using genetic programming based on arithmetic operators,' J. of Korea Information Science Society: Software and Applications, vol. 31, no. 8, pp. 999-1009, 2004   과학기술학회마을