Browse > Article

Ensemble Classifier with Negatively Correlated Features for Cancer Classification  

원홍희 (연세대학교 컴퓨터과학과)
조성배 (연세대학교 컴퓨터과학과)
Abstract
The development of microarray technology has supplied a large volume of data to many fields. In particular, it has been applied to prediction and diagnosis of cancer, so that it expectedly helps us to exactly predict and diagnose cancer. It is essential to efficiently analyze DNA microarray data because the amount of DNA microarray data is usually very large. Since accurate classification of cancer is very important issue for treatment of cancer, it is desirable to make a decision by combining the results of various expert classifiers rather than by depending on the result of only one classifier. Generally combining classifiers gives high performance and high confidence. In spite of many advantages of ensemble classifiers, ensemble with mutually error-correlated classifiers has a limit in the performance. In this paper, we propose the ensemble of neural network classifiers learned from negatively correlated features using three benchmark datasets to precisely classify cancer, and systematically evaluate the performances of the proposed method. Experimental results show that the ensemble classifier with negatively correlated features produces the best recognition rate on the three benchmark datasets.
Keywords
DNA microarray; gene expression data; cancer classification; feature selection; classifier; ensemble classifier; negative correlation;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M. and Yakhini, N., 'Tissue classification with gene expression profiles,' Journal of Computational Biology, vol. 7, pp. 559-584, 2000   DOI   ScienceOn
2 Cho, S. - B. and Ryu, J. - W., 'Classifying gene expression data of cancer using classifier ensemble with mutually exclusive features,' Proc. of the IEEE, vol. 90 , no. 11, pp. 1744-1753, 2002   DOI   ScienceOn
3 Eisen, M. B. and Brown, P. O., 'DNA arrays for analysis of gene expression,' Methods Enzymbol, vol. 303, pp. 179-205, 1999   DOI
4 Dudoit, S., Fridlyand, J. and Speed, T. P., 'Comparison of discrimination methods for the classification of tumors using gene expression data,' Technical Report 576, Department of Statistics, University of California, Berkeley, 2000
5 Nguyen, D. V. and Rocke, D. M., 'Tumor classification by partial least squares using microarray gene expression data,' Bioinformatics, vol. 18, no. 1, pp. 39-50, 2002   DOI   ScienceOn
6 Shamir, R. and Sharan, R., 'Algorithmic approaches to clustering gene expression data,' Current Topics in Computational Biology. In Jiang, T., Smith, T., Xu, Y. and Zhang, M. Q. (eds), MIT press, 2001
7 Lashkari, D., Derisi, J., McCusker, J., Namath, A., Gentile, C., Hwang, S., Brown, P., and Davis, R., 'Yeast microarrays for genome wide parallel genetic and gene expression analysis,' Proc. of the Nail. Acad of Sci. USA, vol. 94, pp. 13057-13062, 1997   DOI
8 Derisi, J., Iyer, V. and Brosn, P., 'Exploring the metabolic and genetic control of gene expression on a genomic scale,' Science, vol. 278, pp. 680-686, 1997   DOI   ScienceOn
9 Eisen, M. B., Spellman, P. T., Brown, P. O. and Bostein, D., 'Cluster analysis and display of genome-wide expression patterns,' Proc. of the Natl. Acad of Sci. USA, vol. 95, pp, 14863-14868, 1998   DOI
10 Lipshutz, R. J., Fodor, S. P. A., Gingeras, T. R, and Lockhart, D. J., 'High density synthetic oligonucleotide arrays,' Nature Genetics, vol. 21, pp. 20-24, 1999   DOI   ScienceOn
11 Fuhrman, S., Cunningham, M. J., Wen, X., Zweiger, G., Seilhamer, J. and Somogyi, R, 'The application of Shannon entropy in the identification of putative drug targets,' Biosystems, vol. 55, pp. 5-14, 2000   DOI   ScienceOn
12 Thieffry, D. and Thomas, R., 'Qualitative analysis of gene networks,' Pacific Symposium on Biocomputing, vol. 3, pp. 66-76. 1998
13 Friedman, N., Linial, M., Nachman, I. and Pe'er, D., 'Using Bayesian networks to analyze expression data,' Journal of Computational Biology, vol. 7, pp. 601-620, 2000   DOI   ScienceOn
14 Arkin, A., Shen, P. and Ross, J., 'A test case of correlation metric construction of a reaction pathway from measurements,' Science, vol. 277, pp. 1275-1279, 1997   DOI   ScienceOn
15 Li, L., Weinberg, C. R, Darden, T. A and Pedersen, L. G., 'Gene selection for sample classification based on gene expression data: Study of sensitivity to choice of parameters of the GA/KNN method,' Bioinformatics, vol. 17, no. 12, pp. 1131-1142, 2001   DOI   ScienceOn
16 Khan, J., Wei, J. S., Ringner, M., Saal, L. H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C. R., Peterson, C. and Meltzer, P. S., 'Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks,' Nature Medicine, vol. 7, no. 6, pp. 673-679, 2001   DOI   ScienceOn
17 Xu, Y., Selaru, M., Yin, J., Zou, T. T., Shustova, V., Mori, Y., Sato, F., Liu, T. C., Olaru, A., Wang, S., Kimes, M. C., Perry, K., Desai, K., Greenwood, B. D., Krasna, M. J., Shibata, D., Abraham, J. M. and Meltzer, S. I., 'Artificial neural networks and gene filtering distinguish between global gene expression profiles of Barrett's esophagus and esophageal cancer,' Cancer Research, vol. 62, pp. 3493-3497, 2002
18 Furey, T. S., Cristianini, N., Duffy, N., Bednarski, D. W., Schummer, M. and Haussler, D., 'Support vector machine classification and validation of cancer tissue samples using microarray expression data,' Bioirformatics, vol. 16, no. 10, pp. 906-914, 2000   DOI   ScienceOn
19 Brown, M. P. S., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C. W., Furey, T. S., Ares, M. Jr. and Haussler, D., 'Knowledge-based analysis of microarray gene expression data by using support vector machines,' Proc. of the Natl. Acad of Sci. USA, vol. 97, pp. 262-267, 2000   DOI
20 Harrington, C. A., Rosenow, C., and Retief, J., 'Monitoring gene expression using DNA microarrays,' Curro Opin. Microbiol., vol. 3, pp. 285-291, 2000   DOI   ScienceOn
21 Tamayo, P., 'Interpreting patterns of gene expression with self-organizing map: Methods and application to hematopoietic differentiation,' Proc. of the National Academy of Sciences of the United States of America, vol. 96, pp. 2907-2912, 1999   DOI
22 Dettling, M. and Buhlmann, P., 'How to use boosting for tumor classification with gene expression data,' Technical Report, Department of Statistics, ETH Zurich, 2002
23 Lossos, I. S., Alizadeh, A. A., Eisen, M. B., Chan, W. C., Brown, P.O., Bostein, D., Staudt, L. M., and Levy, R., 'Ongoing immunoglobulin somatic mutation in germinal center B cell-like but not in activated B cell-like diffuse large cell lymphomas,' Proc. of the Natl. Acad. of Sci. USA, vol. 97, no. 18, pp. 10209-10213, 2000   DOI   ScienceOn
24 Golub, T. R, Slonim, D. K., Tamayo, P., Huard, C., GaasenBeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Blomfield, C. D., and Lander, E. S., 'Molecular classification of cancer: Class discovery and class prediction by gene-expression monitoring,' Science, vol. 286, pp. 531-537, 1999   DOI   ScienceOn
25 Liu, J. and Iba, H., 'Selecting informative genes with parallel genetic algorithms in tissue classification,' Genome Informatics, vol. 12, pp. 14-23, 2001
26 Lippman, R. P., 'An introduction to computing with neural nets,' IEEE ASSP Magazine, 4-22, 1987
27 Li, W. and Yang, Y., 'How many genes are needed for a discriminant microarray data analysis,' Critical Assessment of Techniques for Microarray Data Mining Workshop, 2000