암 분류를 위한 음의 상관관계 특징을 이용한 앙상블 분류기

Ensemble Classifier with Negatively Correlated Features for Cancer Classification

  • 원홍희 (연세대학교 컴퓨터과학과) ;
  • 조성배 (연세대학교 컴퓨터과학과)
  • 발행 : 2003.12.01

초록

최근의 DNA 마이크로어레이 기술로 많은 양의 유전자 데이타를 얻을 수 있는데, 특히 암의 진단과 치료에 적용되어 암의 정확한 분류에 많은 도움을 줄 것으로 기대된다. DNA로부터 얻어지는 유전자 데이타의 양은 매우 방대하므로 이를 효과적으로 분석하는 것은 매우 중요하다. 암의 분류는 진단과 치료에 있어 매우 중요하므로 하나의 분류기에 의존한 분류 결과보다는 다수의 전문화된 분류기 결과를 결합하여 결과를 도출하는 것이 바람직하다. 일반적으로 분류기를 결합함으로써 분류 성능 및 분류 결과에 대한 신뢰도를 높일 수 있다. 앙상블 분류기의 많은 장점에도 불구하고, 오류 의존적인 분류기의 결합은 성능 향상에 한계가 있다. 본 논문에서는 암을 정확하게 분류하기 위해서 음의 상관관계를 갖는 특징으로 학습한 신경망 분류기를 결합하는 방법을 제안하고, 제안한 방법의 유용성을 체계적으로 분석하고자 한다. 세 가지 벤치마크 암 데이타에 대하여 제안한 방법을 적용하여 실험한 결과, 음의 상관관계 특징을 이용한 앙상블 분류기가 다른 분류기보다 높은 성능을 내는 것을 확인할 수 있었다.

The development of microarray technology has supplied a large volume of data to many fields. In particular, it has been applied to prediction and diagnosis of cancer, so that it expectedly helps us to exactly predict and diagnose cancer. It is essential to efficiently analyze DNA microarray data because the amount of DNA microarray data is usually very large. Since accurate classification of cancer is very important issue for treatment of cancer, it is desirable to make a decision by combining the results of various expert classifiers rather than by depending on the result of only one classifier. Generally combining classifiers gives high performance and high confidence. In spite of many advantages of ensemble classifiers, ensemble with mutually error-correlated classifiers has a limit in the performance. In this paper, we propose the ensemble of neural network classifiers learned from negatively correlated features using three benchmark datasets to precisely classify cancer, and systematically evaluate the performances of the proposed method. Experimental results show that the ensemble classifier with negatively correlated features produces the best recognition rate on the three benchmark datasets.

키워드

참고문헌

  1. Harrington, C. A., Rosenow, C., and Retief, J., 'Monitoring gene expression using DNA microarrays,' Curro Opin. Microbiol., vol. 3, pp. 285-291, 2000 https://doi.org/10.1016/S1369-5274(00)00091-6
  2. Eisen, M. B. and Brown, P. O., 'DNA arrays for analysis of gene expression,' Methods Enzymbol, vol. 303, pp. 179-205, 1999 https://doi.org/10.1016/S0076-6879(99)03014-1
  3. Dudoit, S., Fridlyand, J. and Speed, T. P., 'Comparison of discrimination methods for the classification of tumors using gene expression data,' Technical Report 576, Department of Statistics, University of California, Berkeley, 2000
  4. Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M. and Yakhini, N., 'Tissue classification with gene expression profiles,' Journal of Computational Biology, vol. 7, pp. 559-584, 2000 https://doi.org/10.1089/106652700750050943
  5. Cho, S. - B. and Ryu, J. - W., 'Classifying gene expression data of cancer using classifier ensemble with mutually exclusive features,' Proc. of the IEEE, vol. 90 , no. 11, pp. 1744-1753, 2002 https://doi.org/10.1109/JPROC.2002.804682
  6. Lashkari, D., Derisi, J., McCusker, J., Namath, A., Gentile, C., Hwang, S., Brown, P., and Davis, R., 'Yeast microarrays for genome wide parallel genetic and gene expression analysis,' Proc. of the Nail. Acad of Sci. USA, vol. 94, pp. 13057-13062, 1997 https://doi.org/10.1073/pnas.94.24.13057
  7. Derisi, J., Iyer, V. and Brosn, P., 'Exploring the metabolic and genetic control of gene expression on a genomic scale,' Science, vol. 278, pp. 680-686, 1997 https://doi.org/10.1126/science.278.5338.680
  8. Eisen, M. B., Spellman, P. T., Brown, P. O. and Bostein, D., 'Cluster analysis and display of genome-wide expression patterns,' Proc. of the Natl. Acad of Sci. USA, vol. 95, pp, 14863-14868, 1998 https://doi.org/10.1073/pnas.95.25.14863
  9. Shamir, R. and Sharan, R., 'Algorithmic approaches to clustering gene expression data,' Current Topics in Computational Biology. In Jiang, T., Smith, T., Xu, Y. and Zhang, M. Q. (eds), MIT press, 2001
  10. Lipshutz, R. J., Fodor, S. P. A., Gingeras, T. R, and Lockhart, D. J., 'High density synthetic oligonucleotide arrays,' Nature Genetics, vol. 21, pp. 20-24, 1999 https://doi.org/10.1038/4447
  11. Fuhrman, S., Cunningham, M. J., Wen, X., Zweiger, G., Seilhamer, J. and Somogyi, R, 'The application of Shannon entropy in the identification of putative drug targets,' Biosystems, vol. 55, pp. 5-14, 2000 https://doi.org/10.1016/S0303-2647(99)00077-5
  12. Thieffry, D. and Thomas, R., 'Qualitative analysis of gene networks,' Pacific Symposium on Biocomputing, vol. 3, pp. 66-76. 1998
  13. Friedman, N., Linial, M., Nachman, I. and Pe'er, D., 'Using Bayesian networks to analyze expression data,' Journal of Computational Biology, vol. 7, pp. 601-620, 2000 https://doi.org/10.1089/106652700750050961
  14. Arkin, A., Shen, P. and Ross, J., 'A test case of correlation metric construction of a reaction pathway from measurements,' Science, vol. 277, pp. 1275-1279, 1997 https://doi.org/10.1126/science.277.5330.1275
  15. Li, L., Weinberg, C. R, Darden, T. A and Pedersen, L. G., 'Gene selection for sample classification based on gene expression data: Study of sensitivity to choice of parameters of the GA/KNN method,' Bioinformatics, vol. 17, no. 12, pp. 1131-1142, 2001 https://doi.org/10.1093/bioinformatics/17.12.1131
  16. Khan, J., Wei, J. S., Ringner, M., Saal, L. H., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C. R., Peterson, C. and Meltzer, P. S., 'Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks,' Nature Medicine, vol. 7, no. 6, pp. 673-679, 2001 https://doi.org/10.1038/89044
  17. Xu, Y., Selaru, M., Yin, J., Zou, T. T., Shustova, V., Mori, Y., Sato, F., Liu, T. C., Olaru, A., Wang, S., Kimes, M. C., Perry, K., Desai, K., Greenwood, B. D., Krasna, M. J., Shibata, D., Abraham, J. M. and Meltzer, S. I., 'Artificial neural networks and gene filtering distinguish between global gene expression profiles of Barrett's esophagus and esophageal cancer,' Cancer Research, vol. 62, pp. 3493-3497, 2002
  18. Furey, T. S., Cristianini, N., Duffy, N., Bednarski, D. W., Schummer, M. and Haussler, D., 'Support vector machine classification and validation of cancer tissue samples using microarray expression data,' Bioirformatics, vol. 16, no. 10, pp. 906-914, 2000 https://doi.org/10.1093/bioinformatics/16.10.906
  19. Brown, M. P. S., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C. W., Furey, T. S., Ares, M. Jr. and Haussler, D., 'Knowledge-based analysis of microarray gene expression data by using support vector machines,' Proc. of the Natl. Acad of Sci. USA, vol. 97, pp. 262-267, 2000 https://doi.org/10.1073/pnas.97.1.262
  20. Golub, T. R, Slonim, D. K., Tamayo, P., Huard, C., GaasenBeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Blomfield, C. D., and Lander, E. S., 'Molecular classification of cancer: Class discovery and class prediction by gene-expression monitoring,' Science, vol. 286, pp. 531-537, 1999 https://doi.org/10.1126/science.286.5439.531
  21. Tamayo, P., 'Interpreting patterns of gene expression with self-organizing map: Methods and application to hematopoietic differentiation,' Proc. of the National Academy of Sciences of the United States of America, vol. 96, pp. 2907-2912, 1999 https://doi.org/10.1073/pnas.96.6.2907
  22. Dettling, M. and Buhlmann, P., 'How to use boosting for tumor classification with gene expression data,' Technical Report, Department of Statistics, ETH Zurich, 2002
  23. Liu, J. and Iba, H., 'Selecting informative genes with parallel genetic algorithms in tissue classification,' Genome Informatics, vol. 12, pp. 14-23, 2001
  24. Lippman, R. P., 'An introduction to computing with neural nets,' IEEE ASSP Magazine, 4-22, 1987
  25. Lossos, I. S., Alizadeh, A. A., Eisen, M. B., Chan, W. C., Brown, P.O., Bostein, D., Staudt, L. M., and Levy, R., 'Ongoing immunoglobulin somatic mutation in germinal center B cell-like but not in activated B cell-like diffuse large cell lymphomas,' Proc. of the Natl. Acad. of Sci. USA, vol. 97, no. 18, pp. 10209-10213, 2000 https://doi.org/10.1073/pnas.180316097
  26. Li, W. and Yang, Y., 'How many genes are needed for a discriminant microarray data analysis,' Critical Assessment of Techniques for Microarray Data Mining Workshop, 2000
  27. Nguyen, D. V. and Rocke, D. M., 'Tumor classification by partial least squares using microarray gene expression data,' Bioinformatics, vol. 18, no. 1, pp. 39-50, 2002 https://doi.org/10.1093/bioinformatics/18.1.39