Browse > Article

Significant Gene Selection Using Integrated Microarray Data Set with Batch Effect  

Kim Ki-Yeol (Oral Cancer Research Institute, Yonsei University College of Dentistry)
Chung Hyun-Cheol (Department of Internal Medicine, Yonsei University College of Medicine, Brain Korea 21 Project for Medical Science, Yonsei University College of Medicine, Cancer Metastasis Research Center, Yonsei University College of Medicine, Yonsei Cancer Center, Yonsei University College of Medicine)
Jeung Hei-Cheul (Cancer Metastasis Research Center, Yonsei University College of Medicine)
Shin Ji-Hye (Cancer Metastasis Research Center, Yonsei University College of Medicine)
Kim Tae-Soo (Cancer Metastasis Research Center, Yonsei University College of Medicine, Yonsei Cancer Center, Yonsei University College of Medicine)
Rha Sun-Young (Brain Korea 21 Project for Medical Science, Yonsei University College of Medicine, Cancer Metastasis Research Center, Yonsei University College of Medicine)
Abstract
In microarray technology, many diverse experimental features can cause biases including RNA sources, microarray production or different platforms, diverse sample processing and various experiment protocols. These systematic effects cause a substantial obstacle in the analysis of microarray data. When such data sets derived from different experimental processes were used, the analysis result was almost inconsistent and it is not reliable. Therefore, one of the most pressing challenges in the microarray field is how to combine data that comes from two different groups. As the novel trial to integrate two data sets with batch effect, we simply applied standardization to microarray data before the significant gene selection. In the gene selection step, we used new defined measure that considers the distance between a gene and an ideal gene as well as the between-slide and within-slide variations. Also we discussed the association of biological functions and different expression patterns in selected discriminative gene set. As a result, we could confirm that batch effect was minimized by standardization and the selected genes from the standardized data included various expression pattems and the significant biological functions.
Keywords
genomic data; integration; batch effect; bioinformatics;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Choi, J.K., Yu, U., Kim, S., and Yoo, O.J. (2003). Combining multiple microarray studies and modeling interstudy variation. Bioinformatics 19, 184-190
2 EASE (Expression Analysis Systematic Explorer). http://david.niaid.nih.gov/david/
3 Ramaswamy, S., Ross, K.N., Lander, E.S., and Golub, T.R. (2003). A molecular signature of metastasis in primary solid tumors. Nat. Genet. 33, 49-54   DOI   ScienceOn
4 Benito, M., Parker, J., Du, Q., Wu, J., Xiang, D., Perou, C.M., and Marron, J.S. (2004). Adjustment of systematic microarray data biases. Bioinformatics 20, 105-114   DOI   ScienceOn
5 Xin, W., Rhodes, D.R., Ingold, C., Chinnaiyan, A.M., and Rubin, M.A. (2003). Dysregulation of the annexin family protein family is associated with prostate cancer progression. Am. J. Pathol. 162, 255-261   DOI   ScienceOn
6 Kim, T.M., Jeong, H.J., Seo, M.Y., Kim, S.C., Cho, G., Park, K.H., et al. (2005). Determination of genes related to gastrointestinal tract origin cancer cells using a cDNA microarray. Clin Cancer Res. 11, 79-86
7 Detours, V., Dumont, J.E., Bersini, H., and Maenhaut, C. (2003). Integration and cross-validation of high-throughput gene expression data: Comparing heterogeneous data sets. FEBS Lett. 546, 98-102   DOI   ScienceOn
8 Breiman, L. (2001). Random Forests. Berkeley, CA, Statistics Department, University of California 1-33
9 Alter, O., Brown, P.O., and Botstein, D. (2000). Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. USA 97, 10101-10106
10 R: A language and environment for statistical computing. http://www.R-project.org
11 Yuen, T., Wurmbach, E., Pfeffer, R.L., Ebersole, B.J., and Sealfon, S.C. (2002). Accuracy and calibration of commercial oligonucleotide and custom cDNA microarrays. Nucleic Acids Res. 30, e48   DOI   ScienceOn
12 Breitling, R., Sharif, O., Hartman, M.L., and Krisans, S.K. (2002). Loss of compartmentalization causes misregulation of lysine biosynthesis in peroxisome-deficient yeast cells. Eukaryot. Cell 1, 978-986   DOI
13 Lee, P.D., Sladek, R., Greenwood, C.M., and Hudson, T.J. (2002). Control genes and variability: Absence of ubiquitous reference transcripts in diverse mammalian expression studies. Genome Res. 12, 292-297   DOI   ScienceOn
14 Kanji, G.K. (1993). 100 Statistical Tests. (London, Thousand Oaks, New Delhi, SAGE publication)
15 Rhodes, D.R., Barrette, T.R., Rubin, M.A., Ghosh, D., and Chinnaiyan, A.M. (2002). Meta-analysis of microarrays: Interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. Cancer Res. 62, 4427-4433
16 Nielsen, T.O., West, R.B., Linn, S.C., Alter, O., Knowling, M.A., O'Connell, J.X., Zhu, S., Fero, M., Sherlock, G., Pollack, J.R., Brown, P.O., Botstein, D., and van de Rijn, M. (2002). Molecular characterisation of soft tissue tumours: a gene expression study. Lancet 359, 1301-1307   DOI   ScienceOn
17 Sorlie, T., Tibshirani, R., Parker, J., Hastie, T., Marron, J.S., Nobel, A., Deng, S., Johnsen, H., Pesich, R., Geisler, S., et al. (2003). Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc. Natl. Acad. Sci. USA 100, 8418-8423