Browse > Article
http://dx.doi.org/10.3745/KIPSTD.2008.15-D.6.767

Generating Rank-Comparison Decision Rules with Variable Number of Genes for Cancer Classification  

Yoon, Young-Mi (가천의과학대학교 IT학과)
Bien, Sang-Jay (서울대학교 생물정보학)
Park, Sang-Hyun (연세대학교 컴퓨터과학과)
Abstract
Microarray technology is extensively being used in experimental molecular biology field. Microarray experiments generate quantitative expression measurements for thousands of genes simultaneously, which is useful for the phenotype classification of many diseases. One of the two major problems in microarray data classification is that the number of genes exceeds the number of tissue samples. The other problem is that current methods generate classifiers that are accurate but difficult to interpret. Our paper addresses these two problems. We performed a direct integration of individual microarrays with same biological objectives by transforming an expression value into a rank value within a sample and generated rank-comparison decision rules with variable number of genes for cancer classification. Our classifier is an ensemble method which has k top scoring decision rules. Each rule contains a number of genes, a relationship among involved genes, and a class label. Current classifiers which are also ensemble methods consist of k top scoring decision rules. However these classifiers fix the number of genes in each rule as a pair or a triple. In this paper we generalized the number of genes involved in each rule. The number of genes in each rule is in the range of 2 to N respectively. Generalizing the number of genes increases the robustness and the reliability of the classifier for the class prediction of an independent sample. Also our classifier is readily interpretable, accurate with small number of genes, and shed a possibility of the use in a clinical setting.
Keywords
Data Mining; Classification; Knowledge-Based Data Mining; Microarray Data Analysis; Microarray Data Classification;
Citations & Related Records
연도 인용수 순위
  • Reference
1 M. Banerjee, S. Mitra, and H. Banka, “Evolutionary Rough Feature Selection in Gene Expression Data,” IEEE Transactions on Systems, Man, and Cybernetics-Part C, Vol.37, pp.622-636, 2007   DOI   ScienceOn
2 T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Collier, M. L. Loh, J. R. Downing, M. A. Caligiuri, “Molecular classification of Cancer: class discovery and class prediction by gene expression monitoring,” Science, Vol.286, pp.531- 537, 1999   DOI   ScienceOn
3 D. Singh, P. G. Febbo, K. Ross, D. G. Jackson, J. Manola, C. Ladd, “Gene expression correlates of clinical prostate Cancer behavior,” Cancer Cell, Vol. 1, pp.203-209, 2002   DOI   ScienceOn
4 서울대학교 통계학과 생물정보통계연구실, “마이크로어레이 자료의 통계적분석,” 자유아카데미, 2005
5 L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone, “Classification and Regression Tree,” Champmans & Hall, 1984
6 Y. Lai, B. Adam, R. Podolsky, J. She, “A mixture model approach to the tests of concordance and discordance between two large-scale experiments with two-sample groups,” Bioinformatics, Vol.23, pp.1243-1250, 2007   DOI   ScienceOn
7 E. LaTulippe, J. Satagopan, A. Smith, H. Scher, P. Scardino, V. Reuter, “Comprehensive gene expression analysis of prostate Cancer reveals distinct transcriptional programs associated with metastatic disease.,” Cancer Research, Vol.62 pp.4499-4506, 2002
8 A. Tan, D. Naiman, L. Xu, R. Winslow, D. Geman, “Simple decision rules for classifying human Cancers from gene expression profiles,” Bioinformatics, Vol. 21, pp.3896-3904, 2005   DOI   ScienceOn
9 Y. Lu, J. Han, “Cancer classification using gene expression data,” Information Systems, Vol.28, pp.243-268, 2003   DOI   ScienceOn
10 J. R. Quinlan, C4.5: Programs for Machine Learning. San Francisco: Morgan Kaufmann Publishers Inc, 1993
11 J. Han, M. Kamber, Data Mining: Concepts and Techniques Second Edition. San Francisco :Morgan Kaufmann, 2006
12 C. Campbell, S. Mukherjee, P. Tamayo, S. Rogers, R. Rifkin, A. Engle, T. R. Golub, J. Mesirov, “Estimating Dataset Size Requirements for Classifying DNA Microarray Data,” Journal of Computational Biology, Vol.10, pp.119- 142, 2003   DOI   ScienceOn
13 S. Dudoit and J. Fridlyand, “Classication in microarray experiments,” Statistical Analysis of Gene Expression Microarray Data, Chapman and Hall, 2003
14 I. Guyon, J. Weston, S. Barnhill, V. Vapnik, “Gene selection for cancer classification using support vector machines,” Machine Learning, Vol.46, pp.389-422, 2002   DOI
15 T. Joachims, Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms. Kluwer Academic Publishers, 2003. http://svmlight.joachims.org/
16 http://www.affymetrix.com/index.affx
17 E. Wit, J. McClure, Statistics for Microarrays: Design, Analysis and Inference. NJ: John Wiley & Sons Inc., 2004
18 J. B. Welsh, L. M. Sapinoso, A. I. Su, S. G. Kern, J. Wang-Rodriguez, C. A. Moskaluk, “Analysis of gene expression identifies candidate markers and pharmacological targets in prostate Cancer,” Cancer Research, Vol.61, pp.5974-5978, 2001
19 Y. Yoon, J. Lee, S. Park, S. Bien, H. C. Chung, S. Y. Rha, “Direct integration of microarrays for selecting informative genes and phenotype classification,” Information Sciences, Vol.178, pp.88-105, 2008   DOI   ScienceOn