[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.3745/KIPSTD.2008.15-D.6.767

Generating Rank-Comparison Decision Rules with Variable Number of Genes for Cancer Classification

Yoon, Young-Mi (가천의과학대학교 IT학과)
Bien, Sang-Jay (서울대학교 생물정보학)
Park, Sang-Hyun (연세대학교 컴퓨터과학과)

Publication Information

The KIPS Transactions:PartD / v.15D, no.6, 2008 , pp. 767-776 More about this Journal

Abstract

Microarray technology is extensively being used in experimental molecular biology field. Microarray experiments generate quantitative expression measurements for thousands of genes simultaneously, which is useful for the phenotype classification of many diseases. One of the two major problems in microarray data classification is that the number of genes exceeds the number of tissue samples. The other problem is that current methods generate classifiers that are accurate but difficult to interpret. Our paper addresses these two problems. We performed a direct integration of individual microarrays with same biological objectives by transforming an expression value into a rank value within a sample and generated rank-comparison decision rules with variable number of genes for cancer classification. Our classifier is an ensemble method which has k top scoring decision rules. Each rule contains a number of genes, a relationship among involved genes, and a class label. Current classifiers which are also ensemble methods consist of k top scoring decision rules. However these classifiers fix the number of genes in each rule as a pair or a triple. In this paper we generalized the number of genes involved in each rule. The number of genes in each rule is in the range of 2 to N respectively. Generalizing the number of genes increases the robustness and the reliability of the classifier for the class prediction of an independent sample. Also our classifier is readily interpretable, accurate with small number of genes, and shed a possibility of the use in a clinical setting.

Keywords

Data Mining; Classification; Knowledge-Based Data Mining; Microarray Data Analysis; Microarray Data Classification;

Citations & Related Records

Reference

1	M. Banerjee, S. Mitra, and H. Banka, “Evolutionary Rough Feature Selection in Gene Expression Data,” IEEE Transactions on Systems, Man, and Cybernetics-Part C, Vol.37, pp.622-636, 2007 DOI ScienceOn
2	T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Collier, M. L. Loh, J. R. Downing, M. A. Caligiuri, “Molecular classification of Cancer: class discovery and class prediction by gene expression monitoring,” Science, Vol.286, pp.531- 537, 1999 DOI ScienceOn
3	D. Singh, P. G. Febbo, K. Ross, D. G. Jackson, J. Manola, C. Ladd, “Gene expression correlates of clinical prostate Cancer behavior,” Cancer Cell, Vol. 1, pp.203-209, 2002 DOI ScienceOn
4	서울대학교 통계학과 생물정보통계연구실, “마이크로어레이 자료의 통계적분석,” 자유아카데미, 2005
5	L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone, “Classification and Regression Tree,” Champmans & Hall, 1984
6	Y. Lai, B. Adam, R. Podolsky, J. She, “A mixture model approach to the tests of concordance and discordance between two large-scale experiments with two-sample groups,” Bioinformatics, Vol.23, pp.1243-1250, 2007 DOI ScienceOn
7	E. LaTulippe, J. Satagopan, A. Smith, H. Scher, P. Scardino, V. Reuter, “Comprehensive gene expression analysis of prostate Cancer reveals distinct transcriptional programs associated with metastatic disease.,” Cancer Research, Vol.62 pp.4499-4506, 2002
8	A. Tan, D. Naiman, L. Xu, R. Winslow, D. Geman, “Simple decision rules for classifying human Cancers from gene expression profiles,” Bioinformatics, Vol. 21, pp.3896-3904, 2005 DOI ScienceOn
9	Y. Lu, J. Han, “Cancer classification using gene expression data,” Information Systems, Vol.28, pp.243-268, 2003 DOI ScienceOn
10	J. R. Quinlan, C4.5: Programs for Machine Learning. San Francisco: Morgan Kaufmann Publishers Inc, 1993
11	J. Han, M. Kamber, Data Mining: Concepts and Techniques Second Edition. San Francisco :Morgan Kaufmann, 2006
12	C. Campbell, S. Mukherjee, P. Tamayo, S. Rogers, R. Rifkin, A. Engle, T. R. Golub, J. Mesirov, “Estimating Dataset Size Requirements for Classifying DNA Microarray Data,” Journal of Computational Biology, Vol.10, pp.119- 142, 2003 DOI ScienceOn
13	S. Dudoit and J. Fridlyand, “Classication in microarray experiments,” Statistical Analysis of Gene Expression Microarray Data, Chapman and Hall, 2003
14	I. Guyon, J. Weston, S. Barnhill, V. Vapnik, “Gene selection for cancer classification using support vector machines,” Machine Learning, Vol.46, pp.389-422, 2002 DOI
15	T. Joachims, Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms. Kluwer Academic Publishers, 2003. http://svmlight.joachims.org/
16	http://www.affymetrix.com/index.affx
17	E. Wit, J. McClure, Statistics for Microarrays: Design, Analysis and Inference. NJ: John Wiley & Sons Inc., 2004
18	J. B. Welsh, L. M. Sapinoso, A. I. Su, S. G. Kern, J. Wang-Rodriguez, C. A. Moskaluk, “Analysis of gene expression identifies candidate markers and pharmacological targets in prostate Cancer,” Cancer Research, Vol.61, pp.5974-5978, 2001
19	Y. Yoon, J. Lee, S. Park, S. Bien, H. C. Chung, S. Y. Rha, “Direct integration of microarrays for selecting informative genes and phenotype classification,” Information Sciences, Vol.178, pp.88-105, 2008 DOI ScienceOn

KSCI

Generating Rank-Comparison Decision Rules with Variable Number of Genes for Cancer Classification 순위 비교를 기반으로 하는 다양한 유전자 개수로 이루어진 암 분류 결정 규칙의 생성

Generating Rank-Comparison Decision Rules with Variable Number of Genes for Cancer Classification