Browse > Article
http://dx.doi.org/10.29220/CSAM.2017.24.6.627

Identifying differentially expressed genes using the Polya urn scheme  

Saraiva, Erlandson Ferreira (Institute of Mathematics, Federal University of Mato Grosso do Sul)
Suzuki, Adriano Kamimura (Department of Applied Mathematics and Statistics, University of Sao Paulo)
Milan, Luis Aparecido (Department of Statistics, Federal University of Sao Carlos)
Publication Information
Communications for Statistical Applications and Methods / v.24, no.6, 2017 , pp. 627-640 More about this Journal
Abstract
A common interest in gene expression data analysis is to identify genes that present significant changes in expression levels among biological experimental conditions. In this paper, we develop a Bayesian approach to make a gene-by-gene comparison in the case with a control and more than one treatment experimental condition. The proposed approach is within a Bayesian framework with a Dirichlet process prior. The comparison procedure is based on a model selection procedure developed using the discreteness of the Dirichlet process and its representation via Polya urn scheme. The posterior probabilities for models considered are calculated using a Gibbs sampling algorithm. A numerical simulation study is conducted to understand and compare the performance of the proposed method in relation to usual methods based on analysis of variance (ANOVA) followed by a Tukey test. The comparison among methods is made in terms of a true positive rate and false discovery rate. We find that proposed method outperforms the other methods based on ANOVA followed by a Tukey test. We also apply the methodologies to a publicly available data set on Plasmodium falciparum protein.
Keywords
gene expression; Bayesian approach; prior Dirichlet process; Polya urn scheme; Gibbs sampling;
Citations & Related Records
연도 인용수 순위
  • Reference
1 DeRisi JL, Iyer VR, and Brown PO (1997). Exploring the metabolic and genetic control of gene expression on a genomic scale, Science, 278, 680-686.   DOI
2 Escobar MD and West M (1995). Bayesian density estimation and inference using mixtures, Journal of the American Statistical Association, 90, 577-588.   DOI
3 Ferguson TS (1973). A Bayesian analysis of some nonparametric problems, The Annals of Statistics, 2, 209-230.
4 Fox RJ and Dimmic MW (2006). A two-sample Bayesian t-test for microarray data, BMC Bioinformatics, 7, 126.   DOI
5 Antoniak CE (1974). Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems, The Annals of Statistics, 2, 1152-1174.   DOI
6 Arfin SM, Long AD, Ito ET, Tolleri L, Riehle MM, Paegle ES, and Hatfield GW (2000). Global gene expression profiling in Escherichia coli K12: the effects of integration host factor, Journal of Biological Chemistry, 275, 29672-29684.   DOI
7 Baldi P and Long DA (2001). A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes, Bioinformatics, 17, 509-519.   DOI
8 Blackwell D and MacQueen JB (1973). Ferguson distribution via Polya urn schemes, The Annals of Statistics, 1, 353-355.   DOI
9 Gelfand AE and Smith AFM (1990). Sampling-based approaches to calculating marginal densities, Journal of the American Statistical Association, 85, 398-409.   DOI
10 Goeman JJ and Buhlmann P (2007). Analyzing gene expression data in terms of gene set: methodological issues, Bioinformatics, 23, 980-987.   DOI
11 Gopalan R and Berry DA (1998). Bayesian multiple comparisons using Dirichlet process priors, Journal of the American Statistical Association, 93, 1130-1139.   DOI
12 Guindani M, Muller P, and Zhang S (2009). A Bayesian discovery procedure, Journal of the Royal Statistical Society Series B (Statistical Methodology), , 71, 905-925.   DOI
13 Hatfield GW, Hung SP, and Baldi P (2003). Differential analysis of DNA microarray gene expression data, Molecular Microbiology, 47, 871-877.   DOI
14 Jain S and Neal RM (2004). A split-merge Markov chain Monte Carlo procedure for the Dirichlet process mixture model, Journal of Computational and Graphical Statistics, 13, 158-182.   DOI
15 Kim SG, Park JS, and Lee YS (2013). Identification of target clusters by using the restricted normal mixture model, Journal of Applied Statistics, 40, 941-960.   DOI
16 Louzada F, Saraiva EF, Milan LA, and Cobre J (2014). A predictive Bayes factor approach to identify genes differentially expressed: an application to Escherichia coli bacterium data, Brazilian Journal of Probability Statistics, 28, 167-189.   DOI
17 MacEachern SN (2016). Nonparametric Bayesian methods: a gentle introduction and overview, Communications for Statistical Applications and Methods, 23, 445-466.   DOI
18 Medvedovic M and Sivaganesan S (2002). Bayesian infinite mixture model based clustering of gene expression profiles. Bioinformatics, 18, 1194-1206.   DOI
19 Neal RM (1998). Markov chain sampling methods for Dirichlet process mixture models, Technical Report 4915, Retrieved September 1, 2017, from: http://cs.toronto.edu/redford/mixmc.abstract.html
20 Oh HS and Yang WY (2006). A Bayesian multiple testing of detecting differentially expressed genes in two-sample comparison problem, Communications for Statistical Applications and Methods, 13, 39-47.   DOI
21 Oh S (2015). How are Bayesian and non-parametric methods doing a great job in RNA-seq differential expression analysis?: a review, Communications for Statistical Applications and Methods, 22, 181-199.   DOI
22 Parkitna JR, Korostynski M, Kaminska-Chowaniec D, Obara I, Mika J, Przewlocka B, and Przewlocki R (2006). Comparison of gene expression profiles in neuropathic and inflammatory pain, Journal of Physiology and Pharmacology, 57, 401-414.
23 Pavlidis P (2003). Using ANOVA for gene selection from microarray studies of the nervous system, Methods, 31, 282-289.   DOI
24 Saraiva EF and Milan LA (2012). Clustering gene expression data using a posterior split-merge-birth procedure, Scandinavian Journal of Statistics, 39, 399-415.   DOI
25 Wu TD (2001). Analyzing gene expression data from DNA microarrays to identify candidate genes, Journal of Pathology, 195, 53-65.   DOI
26 Zollanvari A, Cunningham MJ, Braga-Neto U, and Dougherty ER (2009). Analysis and modeling of time-course gene-expression profiles from nanomaterial-exposed primary human epidermal keratinocytes, BMC Bioinformatics, 10, S10.
27 Zou F, Huang H, and Ibrahim JG (2010). A semiparametric Bayesian approach for estimating the gene expression distribution, Journal of Biopharmaceutical Statistics, 20, 267-280.   DOI