DOI QR코드

DOI QR Code

Cross Platform Data Analysis in Microarray Experiment

서로 다른 플랫폼의 마이크로어레이 연구 통합 분석

  • Lee, Jangmee (Division of Mathematics and Statistics, Sejong University) ;
  • Lee, Sunho (Division of Mathematics and Statistics, Sejong University)
  • 이장미 (세종대학교 수학통계학부) ;
  • 이선호 (세종대학교 수학통계학부)
  • Received : 2013.01.07
  • Accepted : 2013.03.26
  • Published : 2013.04.30

Abstract

With the rapid accumulation of microarray data, it is a significant challenge to integrate available data sets addressing the same biological questions that can provide more samples and better experimental results. Sometimes, different microarray platforms make it difficult to effectively integrate data from several studies and there is no consensus on which method is the best to produce a single and unified data set. Methods using median rank score, quantile discretization and standardization (which directly combine rescaled gene expression values) and meta-analysis (which combine the results of individual studies at the interpretative level) are reviewed. Real data examples downloaded from GEO are used to compare the performance of these methods and to evaluate if the combined data set detects more reliable information from the separated data sets or not.

마이크로어레이 실험의 특성상 표본의 수가 많지 않는 단점을 보완하고 분석 결과를 일반화하기 위하여 공개 저장소에 축적된 자료 중에 연구 목적이 동일한 여러 연구들을 통합하여 분석하려는 시도가 활발하다. 그러나 실험에서 사용한 플랫폼이 서로 다른 경우에는 유전자 관찰값의 분포가 달라지기 때문에 통합이 어렵고 최상의 통합 방법이 제시되어 있지 않다. 본 논문에서는 순위 기반 중위수, 분위수 이산화와 표준화를 각각 이용하여 변환한 자료값을 직접 합치거나 메타분석을 하여 연구 결과를 합치는 방법을 알아 보았다. 또한 GEO에서 다운받은 실제 자료들을 이용하여 네 가지 방법의 장단점과 효과를 비교하였고 서로 다른 연구 자료를 통합하는 것의 영향을 알아보았다.

Keywords

References

  1. Alizadeh, A. A., Eisen, M. B., Davis, R. E., Ma, C., Lossos, I. S., Rosenwald, A., Boldrick, J. C., Sabet, H., Tran, T., Yu, X., Powell, J. I., Yang, L., Marti, G. E., Moore, T., Hudson, J. Jr, Lu, L., Lewis, D. B., Tibshirani, R., Sherlock, G., Chan, W. C., Greiner, T. C., Weisenburger, D. D., Armitage, J. O., Warnke, R., Levy, R., Wilson, W., Grever, M. R., Byrd, J. C., Botstein, D., Brown, P. O. and Staudt, L. M. (2000). Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling, Nature, 403, 503-511. https://doi.org/10.1038/35000501
  2. Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., Aach, J., Ansorge, W., Ball, C. A., Causton, H. C., Gaasterland, T., Glenisson, P., Holstege, F. C., Kim, I. F., Markowitz, V., Matese, J. C., Parkinson, H., Robinson, A., Sarkans, U., Schulze-Kremer, S., Stewart, J., Taylor, R., Vilo, J. and Vingron, M. (2001). Minimum information about a microarray experiment (MIAME)-toward standards for microarray data, Nature Genetics, 29, 365-371. https://doi.org/10.1038/ng1201-365
  3. Campain, A. and Yang, Y. H. (2010). Comparison study of microarray meta-analysis methods, BMC Bioin- formatics, 11, 408. https://doi.org/10.1186/1471-2105-11-408
  4. Diehn, M., Sherlock, G., Binkley, G., Jin, H., Matese, J. C., Hernandez-Boussard, T., Rees, C. A., Cherry, J. M., Botstein, D., Brown, P. O. and Alizadeh, A. A. (2003). SOURCE: A unified genomic resource of functional annotations, ontologies, and gene expression data, Nucleic Acids Research, 31, 219-223. https://doi.org/10.1093/nar/gkg014
  5. Fisher, R. A. (1932). Statistical Methods for Research Workers, 4ed. Oliver and Boyd, Edinburgh.
  6. Good, I. J. (1955). On the weighted combination of significance tests, Journal of Royal Statistical Society, 2, 264-265.
  7. Hong, F. and Breitling, R. (2008). A comparison of meta-analysis methods for detecting differentially expressed genes in microarray experiments, Bioinformatics, 24, 374-382. https://doi.org/10.1093/bioinformatics/btm620
  8. Huang, D. W., Sherman, B. T. and Lempicki, R. A. (2009). Systematic and integrative analysis of large gene lists using DAVID Bioinformatics Resources, Nature Protocols, 4, 44-57.
  9. Kim, K. Y., Ki, D., Jeung, H. C., Chung, H. C. and Rha, S. Y. (2008). Improving the prediction accuracy in classification using the combined data sets by ranks of gene expressions, BMC Bioinformatics, 9, 283. https://doi.org/10.1186/1471-2105-9-283
  10. Kuner, R., Muley, T., Meister, M., Ruschhaupt, M., Buness, A., Xu, E. C., Schnabel, P., Warth, A., Poustka, A., Sultmann, H. and Hoffmann, H. (2009). Global gene expression analysis reveals specific patterns of cell junctions in non-small cell lung cancer subtypes, Lung Cancer, 63, 32-38. https://doi.org/10.1016/j.lungcan.2008.03.033
  11. Kuo, W. P., Jenssen, T. K., Butte, A. J., Ohno-Machado, L. and Kohane, I. S. (2002). Analysis of matched mRNA measurements from two different microarray technologies, Bioinformatics, 18, 405-412. https://doi.org/10.1093/bioinformatics/18.3.405
  12. Larkin, J. E., Frank, B. C., Gavras, H., Sultana, R. and Quackenbush, J. (2005). Independence and reproducibility across microarray platforms, Nature Methods, 2, 337-344. https://doi.org/10.1038/nmeth757
  13. Liu, H., Hussain, F., Tan, C. L. and Dash, M. (2002). Discretization: An enabling technique, Data Mining and Knowledge Discovery, 6, 393-423. https://doi.org/10.1023/A:1016304305535
  14. Newnham, G. M., Conron, M., McLachlan, S., Dobrovic, A., Do, H., Li, J., Opeskin, K., Thompson, N., Wright, G. M. and Thomas, D. M. (2011). Integrated mutation, copy number and expression profiling in resectable non-small cell lung cancer, BMC Cancer, 7, 11-93.
  15. Rudy, J. and Valafar, F. (2011). Empirical comparison of cross-platform normalization methods for gene expression data, BMC Bioinformatics, 12, 467. https://doi.org/10.1186/1471-2105-12-467
  16. Shabalin, A., Tjelmeland, H., Fan, C., Perou, C. and Nobel, A. (2008). Merging two gene-expression studies via cross-platform normalization, Bioinformatics, 24, 1154-1160. https://doi.org/10.1093/bioinformatics/btn083
  17. Shaknovich, R., Geng, H., Johnson, N. A., Tsikitas, L., Cerchietti, L., Greally, J. M., Gascoyne, R. D., Elemento, O. and Melnick, A. (2010). DNA methylation signatures define molecular subtypes of diffuse large B-cell lymphoma, Blood, 116, e81-e89. https://doi.org/10.1182/blood-2010-05-285320
  18. Shi, L., Reid, L. H., Jones, W. D., Shippy, R., Warrington, J. A., Baker, S. C., Collins, P. J., de Longueville, F., Kawasaki, E. S., Lee, K. Y., Luo, Y., Sun, Y. A., Willey, J. C., Setterquist, R. A., Fischer, G. M., Tong, W., Dragan, Y. P., Dix, D. J., Frueh, F. W., Goodsaid, F. M., Herman, D., Jensen, R. V., Johnson, C. D., Lobenhofer, E. K., Puri, R. K., Schrf, U., Thierry-Mieg, J., Wang, C., Wilson, M., Wolber, P. K., Zhang, L., Amur, S., Bao, W., Barbacioru, C. C., Lucas, A. B., Bertholet, V., Boysen, C., Bromley, B., Brown, D., Brunner, A., Canales, R., Cao, X. M., Cebula, T. A., Chen, J. J., Cheng, J., Chu, T. M., Chudin, E., Corson, J., Corton, J. C., Croner, L. J., Davies, C., Davison, T. S., Delenstarr, G., Deng, X., Dorris, D., Eklund, A. C., Fan, X. H., Fang, H., Fulmer-Smentek, S., Fuscoe, J. C., Gallagher, K., Ge, W., Guo, L., Guo, X., Hager, J., Haje, P. K., Han, J., Han, T., Harbottle, H. C., Harris, S. C., Hatchwell, E., Hauser, C. A., Hester, S., Hong, H., Hurban, P., Jackson, S. A., Ji, H., Knight, C. R., Kuo, W. P., LeClerc, J. E., Levy, S., Li, Q. Z., Liu, C., Liu, Y., Lombardi, M. J., Ma, Y., Magnuson, S. R., Maqsodi, B., McDaniel, T., Mei, N., Myklebost, O., Ning, B., Novoradovskaya, N., Orr, M. S., Osborn, T. W., Papallo, A., Patterson, T. A., Perkins, R. G., Peters, E. H., Peterson, R., Philips, K. L., Pine, P. S., Pusztai, L., Qian, F., Ren, H., Rosen, M., Rosenzweig, B. A., Samaha, R. R., Schena, M., Schroth, G. P., Shchegrova, S., Smith, D. D., Staedtler, F., Su, Z., Sun, H., Szallasi, Z., Tezak, Z., Thierry-Mieg, D., Thompson, K. L., Tikhonova, I., Turpaz, Y., Vallanat, B., Van, C., Walker, S. J., Wang, S. J., Wang, Y., Wolfinger, R., Wong, A., Wu, J., Xiao, C., Xie, Q., Xu, J., Yang, W., Zhang, L., Zhong, S., Zong, Y. and Slikker, W. Jr. (2006). The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements, Nature Biotechnology, 24, 1151-1161. https://doi.org/10.1038/nbt1239
  19. Stec, J., Wang, J., Coombes, K., Ayers, M., Hoersch, S., Gold, D. L., Ross, J. S., Hess, K. R., Tirrell, S., Linette, G., Hortobagyi, G. N., Fraser Symmans, W. and Pusztai, L. (2005). Comparison of the predictive accuracy of DNA array based multigene classifiers across cDNA arrays and Affymetrix Gene Chips, Journal of Molecualr Diagnosis, 7, 357-367. https://doi.org/10.1016/S1525-1578(10)60565-X
  20. Tan, P. K., Downey, T. J., Spitznagel, E. L. Jr, Xu, P., Fu, D., Dimitrov, D. S., Lempicki, R. A., Raaka, B. M. and Cam, M. C. (2003). Evaluation of gene expression measurements from commercial microarray platforms, Nucleic Acids Research, 31, 5676-5684. https://doi.org/10.1093/nar/gkg763
  21. Walker, W. L., Liao, I. H., Gilbert, D. L., Wong, B., Pollard, K. S., McCulloch, C. E., Lit, L. and Sharp, F. R. (2008). Empirical Bayes accommodation of batch-effects in microarray data using identical replicate reference samples: application to RNA expression profiling of blood from Duchenne muscular dystrophy patients, BMC Genomics, 9, 494. https://doi.org/10.1186/1471-2164-9-494
  22. Warnat, P., Eils, R. and Brors, B. (2005). Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypes, BMC Bioinformatics, 6, 265. https://doi.org/10.1186/1471-2105-6-265
  23. Williams, P. M., Li, R., Johnson, N. A., Wright, G., Heath, J. D. and Gascoyne, R. D. (2010). A novel method of amplification of FFPET-derived RNA enables accurate disease classification with microarrays, Journal of Molecular Diagnosis, 5, 680-686.
  24. Wu, C., Orozco, C., Boyer, J., Leglise, M., Goodale, J., Batalov, S., Hodge, C. L., Haase, J., Janes, J., Huss, J. W. III and Su, A. I. (2009). BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources, Genome Biology, 10, R130. https://doi.org/10.1186/gb-2009-10-11-r130