DOI QR코드

DOI QR Code

Effect of Genetic Correlations on the P Values from Randomization Test and Detection of Significant Gene Groups

유전자 연관성이 랜덤검정 P값과 유의 유전자군의 탐색에 미치는 영향

  • Yi, Mi-Sung (Department of Biostatistics, Medical College, The Catholic University of Korea) ;
  • Song, Hae-Hiang (Department of Biostatistics, Medical College, The Catholic University of Korea)
  • 이미성 (가톨릭대학교 의과대학 의학통계학과) ;
  • 송혜향 (가톨릭대학교 의과대학 의학통계학과)
  • Published : 2009.08.31

Abstract

At an early stage of genomic investigations, a small sample of microarrays is used in gene expression experiments to identify small subsets of candidate genes for a further accurate investigation. Unlike the statistical analysis methods for a large sample of microarrays, an appropriate statistical method for identifying small subsets is a randomization test that provides exact P values. These exact P values from a randomization test for a small sample of microarrays are discrete. The possible existence of differentially expressed genes in the sample of a full set of genes can be tested for the null hypothesis of a uniform distribution. Subsets of smaller P values are of prime interest for a further accurate investigation and identifying these outlier cells from a multinomial distribution of P values is possible by M test of Fuchs et al. (1980). Above all, the genome-wide gene expressions in microarrays are correlated, but the majority of statistical analysis methods in the microarray analysis are based on an independence assumption of genes and ignore the possibly correlated expression levels. We investigated with simulation studies the effect that correlated gene expression levels could have on the randomization test results and M test results, and found that the effects are often not ignorable.

유전체 초기단계 연구에서는 비교적 소수의 마이크로어레이 샘플자료로서 실험을 진행하여 심도 깊게 연구해야 할 유전자 부분군(subsets)을 탐색하게 된다. 이러한 과정에서 요구되는 부분군 탐색에 사용되는 분석방법은 다수 샘플자료 분석의 경우와는 매우 다른 방법들이다. 유전자 극소수 샘플자료의 분석에 매우 적절한 방법인 랜덤검정법을 적용하여 정확한 P값(exact P value)의 이산형 분포가 얻어지고, 일양분포 귀무가설의 검정으로 유의 유전자가 존재하는지를 파악할 수 있다. 한 단계 더 나아가 Fuchs와 Kenett (1980)이 제시한 M 검정을 이용하여 이산형 P 값 다항분포에서 이상범주군(outlier cells)을 찾을 수 있으며 이로써 유의 유전자로서의 가능성이 있는 유전자군을 선정한다. 대다수의 마이크로어레이 유전체 연구에서 수 천 또는 수 만개의 유전자가 서로 독립이라고 가정하고 분석하는 것이 문제점이다. 그러나 본 논문에서는 유전자 연관성을 그대로 유지하는 순열에 기초한 랜덤검정법과 M 검정법으로서 유전자 연관성이 분석에 미치는 영향을 모의실험으로 알아보았으며, 그 영향이 결코 미약하지 않음을 확인할 수 있었다.

Keywords

References

  1. Bohrer, R., Chow, W., Faith, R., Joshi, V. and Wu, C. F. (1981). Multiple three-decision rules for factorial simple effects: Bonferroni wins again!, Journal of the American Statistical Association, 76, 119-124 https://doi.org/10.2307/2287056
  2. Dondrup, M., Huser, A. T., Mertens, D. and Goesmann, A. (2009). An evaluation framework for statistical tests on microarray data, Journal of Biotechnology, 140, 18-26 https://doi.org/10.1016/j.jbiotec.2009.01.009
  3. Fierro, A. C., Vandenbussche, F., Engelen, K., Van de Peer, Y. and Marchal, K. (2008). Meta analysis of gene expression data within and across species, Current Genomics, 9, 525-534 https://doi.org/10.2174/138920208786847935
  4. Fisher, R. A. (1935). The Design of Experiments, Oliver and Boyd, Edinburgh
  5. Fuchs, C. and Kenett, R. (1980). A test for detecting outlying cells in the multinomial distribution and two-way contingency tables, Journal of the American Statistical Association, 75, 395-398 https://doi.org/10.2307/2287465
  6. Gadbury, G. L., Page, G. P., Heo, M., Mountz, J. D. and Allison, D. B. (2003). Randomization tests for small samples: An application for genetic expression data, Journal of the Royal Statistical Society. Series C (Applied Statistics), 52, 365-376 https://doi.org/10.1111/1467-9876.00410
  7. Gibbons, J. D. and Pratt, J. W. (1975). P-values: Interpretation and methodology, The American Statistician, 29, 20-25 https://doi.org/10.2307/2683674
  8. Hu, J. and Wright, F. A. (2007). Assessing differential gene expression with small sample sizes in oligonucleotide arrays using a mean-variance model, Biometrics, 63, 41-49 https://doi.org/10.1111/j.1541-0420.2006.00675.x
  9. Lambert, D. (1985). Robust two-sample permutation tests, The Annals of Statistics, 13, 606-625 https://doi.org/10.1214/aos/1176349542
  10. Murie, C. and Nadon, R. (2008). A correction for estimating error when using the Local Pooled Error Statistical Test, Bioinformatics, 24, 1735-1736 https://doi.org/10.1093/bioinformatics/btn211
  11. Parmigiani, G., Garrett, E. S., Anbazhagan, R. and Gabrielson, E. (2002). A statistical framework for expression-based molecular classification in cancer, Journal of The Royal Statistical Society. Series B, 64, 717-736 https://doi.org/10.1111/1467-9868.00358
  12. Sidak, Z. (1968). On multivariate normal probabilities on rectangles: Their dependence on correlations, The Annals of Mathematical statistics, 39, 1425-1434 https://doi.org/10.1214/aoms/1177698122
  13. Welch, W. J. (1990). Construction of permutation tests, Journal of the American Statistical Association, 85, 693-698 https://doi.org/10.2307/2290004

Cited by

  1. A Method for Gene Group Analysis and Its Application vol.25, pp.2, 2012, https://doi.org/10.5351/KJAS.2012.25.2.269