Browse > Article

Rank-based Multiclass Gene Selection for Cancer Classification with Naive Bayes Classifiers based on Gene Expression Profiles  

Hong, Jin-Hyuk (연세대학교 컴퓨터과학과)
Cho, Sung-Bae (연세대학교 컴퓨터과학과)
Abstract
Multiclass cancer classification has been actively investigated based on gene expression profiles, where it determines the type of cancer by analyzing the large amount of gene expression data collected by the DNA microarray technology. Since gene expression data include many genes not related to a target cancer, it is required to select informative genes in order to obtain highly accurate classification. Conventional rank-based gene selection methods often use ideal marker genes basically devised for binary classification, so it is difficult to directly apply them to multiclass classification. In this paper, we propose a novel method for multiclass gene selection, which does not use ideal marker genes but directly analyzes the distribution of gene expression. It measures the class-discriminability by discretizing gene expression levels into several regions and analyzing the frequency of training samples for each region, and then classifies samples by using the naive Bayes classifier. We have demonstrated the usefulness of the proposed method for various representative benchmark datasets of multiclass cancer classification.
Keywords
gene expression profiles; multiclass cancer classification; gene selection;
Citations & Related Records
연도 인용수 순위
  • Reference
1 J.-H. Hong, and S.-B. Cho, "Multi-class cancer classification with OVR-support vector machines selected by naive Bayes classifier," Lecture Notes in Computer Sciences, Vol.4234, pp. 155-164, 2006
2 S. Armstrong, J. Staunton, L. Silverman, R. Pieters, M. den Boer, M. Minden, S. Sallan, E. Lander, T. Golub, and S. Korsmeyer, "MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia," Nature Genetics, Vol.30, No.1, pp. 41-47, 2002   DOI   ScienceOn
3 T. Li, C. Zhang and M. Ogihara, "A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression," Bioinformatics, Vol.20, No.15, pp. 2429-2437, 2004   DOI   ScienceOn
4 D. Ross, U. Scherf, M. Eisen, C. Perou, P. Spellman, V. Iyer, S. Jeffrey, M. Van de Rijn, M. Waltham, A. Pergamenschikov, J. Lee, D. Lashkari, D. Shalon, T. Myers, J. Weinstein, D. Botstein, and P. Brown, "Systematic variation in gene expression patterns in human cancer cell lines," Nature Genetics, Vol.24, No.3, pp. 227-234, 2000   DOI   ScienceOn
5 S. Ramaswamy, P. Tamayo, R. Rifkin, S. Mukherjee, C. Yeang, M. Angelo, C. Ladd, M. Reich, E. Latulippe, J. Mesirov, T. Poggio, W. Gerald, M. Loda, E. Lander and T. Golub, "Multiclass cancer diagnosis using tumor gene expression signatures," Proc. National Academy of Science, Vol.98, No.26, pp. 15149-15154, 2001   DOI   ScienceOn
6 Y. Wang, F. Makedon, J. Ford and J. Pearlman, "HykGene: A hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data," Bioinformatics, Vol. 21, No.8, pp. 1530-1537, 2005   DOI   ScienceOn
7 S.-B. Cho and J.-W. Ryu, "Classifying gene expression data of cancer using classifier ensemble with mutually exclusive features," Proceedings of the IEEE, Vol.90, No.11, pp. 1744-1753, 2002   DOI   ScienceOn
8 J. Khan, J. Wei, M. Ringnér, L. Saal, M. Ladanyi, F. Westermann, F. Berthold, M. Schwab, C. Antonescu, C. Peterson, and P. Meltzer, "Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks," Nature Medicine, Vol.7, No.6, pp. 673-679, 2001   DOI   ScienceOn
9 K.-Y. Yeung, R. Bumgarner and A. Raftery, "Bayesian model averaging: Development of an improved multi-class, gene selection and classification tool for microarray data," Bioinformatics, Vol.21, No.10, pp. 2394-2402, 2005   DOI   ScienceOn
10 A. Statnikov, C. Aliferis, L. Tsamardinos, D. Hardin and S. Levy, "A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis," Bioinformatics, Vol.21, No.5, pp. 631-643, 2005   DOI   ScienceOn
11 J. Liu, B. Li and T. Dillon, "An improved naïve Bayesian classifier technique coupled with a novel input solution method," IEEE Trans. Systems, Man, and Cybernetics-Part C: Applications and Reviews, Vol.31, No.2, pp. 249-256, 2001
12 Y. Lee and C.-K. Lee, "Classification of multiple cancer types by multicategory support vector machines using gene expression data," Bioinformatics, Vol.19, No.9, pp. 1132-1139, 2003   DOI   ScienceOn