Browse > Article
http://dx.doi.org/10.5351/CKSS.2012.19.6.899

Multiple Testing in Genomic Sequences Using Hamming Distance  

Kang, Moonsu (Department of Information Statistics, Gangneung-Wonju National University)
Publication Information
Communications for Statistical Applications and Methods / v.19, no.6, 2012 , pp. 899-904 More about this Journal
Abstract
High-dimensional categorical data models with small sample sizes have not been used extensively in genomic sequences that involve count (or discrete) or purely qualitative responses. A basic task is to identify differentially expressed genes (or positions) among a number of genes. It requires an appropriate test statistics and a corresponding multiple testing procedure so that a multivariate analysis of variance should not be feasible. A family wise error rate(FWER) is not appropriate to test thousands of genes simultaneously in a multiple testing procedure. False discovery rate(FDR) is better than FWER in multiple testing problems. The data from the 2002-2003 SARS epidemic shows that a conventional FDR procedure and a proposed test statistic based on a pseudo-marginal approach with Hamming distance performs better.
Keywords
Pseudo-marginal approach; false discovery rate; Hamming distance; genomic sequence;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing, Journal of the Royal Statistical Society: Series B, 57, 289-300.
2 Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency, The Annals of Statistics, 29, 1165-1188.   DOI
3 Dudoit, S., Shaffer, J. P. and Boldrick, J. C. (2003). Multiple hypothesis testing in microarray experiments, Statistical Science, 18, 71-103.   DOI   ScienceOn
4 Dye, C. and Gay, N. (2003). Modeling the SARS epidemic, Perspectives Epidemiology, 300.
5 Ghosh, D. (2003). Penalized discriminant methods for the classification of tumors from microarray experiments, Bioinformatics, 59, 992-1000.
6 Huber, P. J. and Ronchetti, E. M. (1981). Robust Statistics, Wiley Series in Probability and Statistics, New York
7 Kang, M. and Sen, P. K. (2007). Multiple Testing in Genome-wide Studies, University of North Carolina at Chapel Hill.
8 Kang, M. and Sen, P. K. (2008). Kendall tau type rank statistics in genomic data, Applications of Mathematics, 3, 207-221.
9 Krishnaiah, P. R. and Sen, P. K. (1985). Handbook of Statistics 4: Nonparametric Methods, North- Holland, Netherlands
10 Odeh, R. E. (1972). On the power of Jonckheere's k-sample test against ordered alternatives, Biometrika, 59, 467-471.   DOI   ScienceOn
11 Pinhero, H. P., Pinhero, A. D. S. and Sen, P. K. (2005). Comparison of genomic sequences using the hamming distance, Journal of Statistical Planning and Inference, 130, 325-339.   DOI   ScienceOn
12 Sen, P. K. (1977). Some invariance principles relating to jackknifing and their role in sequential analysis, The Annals of Statistics, 5, 316-329.   DOI
13 Sen, P. K. (2005). Gini diversity index, hamming distance, and curse of dimensionality, METRON - International Journal of Statistics, LXIII, 329-349.
14 Sen, P. K. (2006). Robust statistical inference for high dimensional data models with application to genomics, Austrian Journal of Statistics, 35, 197-214.
15 Sen, P. K. (2008). Kendall's tau in high-dimensional genomic parsimony, Institute of mathematical Statistics, Collection Series, 3, 251-266.
16 Sen, P. K. and Singer, J. M. (1993). Large Sample Methods in Statistics, Chapman and Hall/CRC, New York.
17 Sidak, Z., Sen, P. K. and Hajek, J. (1999). Theory of Rank Tests, Second Edition (Probability and Mathematical Statistics), San Diego, Academic Press, CA.
18 Silvapulle, M. J. and Sen, P. K. (2004). Constrained Statistical Inference: Inequality, Order, and Shape Restrictions, Wiley-Interscience, New York.
19 Storey, J. (2002). A direct approach to false discovery rates, Journal of the Royal Statistical Society: Series B, 64, 479-498.   DOI   ScienceOn
20 Storey, J. (2003). The positive false discovery rate: A Bayesian interpretation and the q-value, Annals of Statistics, 3, 2013-2035.
21 Storey, J., Taylor, J. E. and Siegmund, D. (2004). Strong control, conservative point estimation, and simultaneous conservative consistency of false discovery rates: A unified approach, Journal of the Royal Statistical Society, Series B, 66, 187-205.   DOI   ScienceOn