[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.29220/CSAM.2020.27.1.141

Comparison of methods for the proportion of true null hypotheses in microarray studies

Kang, Joonsung (Department of Information Statistics, Gangneung-Wonju National University)

Publication Information

Communications for Statistical Applications and Methods / v.27, no.1, 2020 , pp. 141-148 More about this Journal

Abstract

We consider estimating the proportion of true null hypotheses in multiple testing problems. A traditional multiple testing rate, family-wise error rate is too conservative and old to control type I error in multiple testing setups; however, false discovery rate (FDR) has received significant attention in many research areas such as GWAS data, FMRI data, and signal processing. Identify differentially expressed genes in microarray studies involves estimating the proportion of true null hypotheses in FDR procedures. However, we need to account for unknown dependence structures among genes in microarray data in order to estimate the proportion of true null hypothesis since the genuine dependence structure of microarray data is unknown. We compare various procedures in simulation data and real microarray data. We consider a hidden Markov model for simulated data with dependency. Cai procedure (2007) and a sliding linear model procedure (2011) have a relatively smaller bias and standard errors, being more proper for estimating the proportion of true null hypotheses in simulated data under various setups. Real data analysis shows that 5 estimation procedures among 9 procedures have almost similar values of the estimated proportion of true null hypotheses in microarray data.

Keywords

proportion of true null hypotheses; HMM; microarray;

Citations & Related Records

Reference

1	Baldi P and Hatfield W (2002). DNA Microarrays and Gene Expression, Cambridge University Press, Cambridge.
2	Benjamini Y and Hochberg Y (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing, Journal of the Royal Statistical Society: Series B, 57, 289-300. DOI
3	Benjamini Y and Hochberg Y (2000). On the adaptive control of the false discovery rate in multiple testing with independent Statistics, Journal of Educational and Behavioral Statistics, 25, 60-83. DOI
4	Jiang H and Doerge RW (2008). Estimating the proportion of true null hypotheses for multiple comparisons, Cancer Informatics, 6, 25-32.
5	Benjamini Y and Yekutieli D (2001). The control of the false discovery rate in multiple testing under dependency, Annals of Statistics, 29, 1165-1188. DOI
6	Churchill G (1992). Hidden Markov chains and the analysis of genome structure, Computers and Chemistry, 16, 107-115. DOI
7	Ephraim Y and Merhav N (2002). Hidden Markov processes, IEEE Transactions on Information Theory, 48, 1518-1569. DOI
8	Jin J and Cai TT (2007). Estimating the null and the proportion of non-null effects in large-scale multiple comparisons, Journal of the American Statistical Association, 102, 495-506. DOI
9	Krogh A, Brown M, Mian I, Sjolander K, and Haussler D (1994). Hidden Markov models in computational biology. Applications to protein modeling, Journal of Molecular Biology, 235, 1501-1531. DOI
10	Langaas M, Lindqvist BH, and Ferkingstad E (2005). Estimating the proportion of true null hypotheses, with application to DNA microarray data, Journal of the Royal Statistical Society: Series B, 67, 555-572. DOI
11	Nettleton D, Hwang JTG, Caldo RA, and Wise RP (2006). Estimating the number of true null hypotheses from a histogram of p values, Journal of Agricultural, Biological, and Environmental Statistics, 11, 337-356. DOI
12	Pounds S and Cheng C (2006). Robust estimation of the false discovery rate, Bioinformatics, 22, 1979-1987. DOI
13	Sun W and Cai TT (2009). Large-scale multiple testing under dependence, Journal of the Royal Statistical Society: Series B, 71, 393-424. DOI
14	Rabiner L (1989). A tutorial on hidden Markov models and selected applications in speech recognition, IEEE, 77, 257-286. DOI
15	Speed T (2003). Statistical Analysis of Gene Expression Microarray Data, Chapman and Hall/CRC, New York.
16	Storey JD (2002). A direct approach to false discovery rates, Journal of the Royal Statistical Society: Series B, 64, 479-498. DOI
17	Storey JD, Taylor JE, and Siegmund D (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach, Journal of the Royal Statistical Society: Series B, 66, 187-205. DOI
18	Storey JD and Tibshirani R (2003). Statistical significance for genomewide studies. In Proceedings of the National Academy of Sciences, 100, 9440-9445. DOI
19	Van't Wout AB, Lehrman GK, Mikheeva SA, O'Keeffe GC, Katze MG, Bumgarner RE, Geiss GK, and Mullins JI (2003). Cellular gene expression upon human immunodeficiency virus type 1 infection of CD4(+)-T-cell lines, Journal of Virology, 77, 1392-1402. DOI
20	Wang HQ, Tuominen LK, and Tsai CJ (2011). SLIM: a sliding linear model for estimating the proportion of true null hypotheses in datasets with dependence structures, Bioinformatics, 27, 225-231. DOI