DOI QR코드

DOI QR Code

Identifying differentially expressed genes using the Polya urn scheme

  • Received : 2017.06.30
  • Accepted : 2017.09.13
  • Published : 2017.11.30

Abstract

A common interest in gene expression data analysis is to identify genes that present significant changes in expression levels among biological experimental conditions. In this paper, we develop a Bayesian approach to make a gene-by-gene comparison in the case with a control and more than one treatment experimental condition. The proposed approach is within a Bayesian framework with a Dirichlet process prior. The comparison procedure is based on a model selection procedure developed using the discreteness of the Dirichlet process and its representation via Polya urn scheme. The posterior probabilities for models considered are calculated using a Gibbs sampling algorithm. A numerical simulation study is conducted to understand and compare the performance of the proposed method in relation to usual methods based on analysis of variance (ANOVA) followed by a Tukey test. The comparison among methods is made in terms of a true positive rate and false discovery rate. We find that proposed method outperforms the other methods based on ANOVA followed by a Tukey test. We also apply the methodologies to a publicly available data set on Plasmodium falciparum protein.

Keywords

References

  1. Antoniak CE (1974). Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems, The Annals of Statistics, 2, 1152-1174. https://doi.org/10.1214/aos/1176342871
  2. Arfin SM, Long AD, Ito ET, Tolleri L, Riehle MM, Paegle ES, and Hatfield GW (2000). Global gene expression profiling in Escherichia coli K12: the effects of integration host factor, Journal of Biological Chemistry, 275, 29672-29684. https://doi.org/10.1074/jbc.M002247200
  3. Baldi P and Long DA (2001). A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes, Bioinformatics, 17, 509-519. https://doi.org/10.1093/bioinformatics/17.6.509
  4. Blackwell D and MacQueen JB (1973). Ferguson distribution via Polya urn schemes, The Annals of Statistics, 1, 353-355. https://doi.org/10.1214/aos/1176342372
  5. DeRisi JL, Iyer VR, and Brown PO (1997). Exploring the metabolic and genetic control of gene expression on a genomic scale, Science, 278, 680-686. https://doi.org/10.1126/science.278.5338.680
  6. Escobar MD and West M (1995). Bayesian density estimation and inference using mixtures, Journal of the American Statistical Association, 90, 577-588. https://doi.org/10.1080/01621459.1995.10476550
  7. Ferguson TS (1973). A Bayesian analysis of some nonparametric problems, The Annals of Statistics, 2, 209-230.
  8. Fox RJ and Dimmic MW (2006). A two-sample Bayesian t-test for microarray data, BMC Bioinformatics, 7, 126. https://doi.org/10.1186/1471-2105-7-126
  9. Gelfand AE and Smith AFM (1990). Sampling-based approaches to calculating marginal densities, Journal of the American Statistical Association, 85, 398-409. https://doi.org/10.1080/01621459.1990.10476213
  10. Goeman JJ and Buhlmann P (2007). Analyzing gene expression data in terms of gene set: methodological issues, Bioinformatics, 23, 980-987. https://doi.org/10.1093/bioinformatics/btm051
  11. Gopalan R and Berry DA (1998). Bayesian multiple comparisons using Dirichlet process priors, Journal of the American Statistical Association, 93, 1130-1139. https://doi.org/10.1080/01621459.1998.10473774
  12. Guindani M, Muller P, and Zhang S (2009). A Bayesian discovery procedure, Journal of the Royal Statistical Society Series B (Statistical Methodology), , 71, 905-925. https://doi.org/10.1111/j.1467-9868.2009.00714.x
  13. Hatfield GW, Hung SP, and Baldi P (2003). Differential analysis of DNA microarray gene expression data, Molecular Microbiology, 47, 871-877. https://doi.org/10.1046/j.1365-2958.2003.03298.x
  14. Jain S and Neal RM (2004). A split-merge Markov chain Monte Carlo procedure for the Dirichlet process mixture model, Journal of Computational and Graphical Statistics, 13, 158-182. https://doi.org/10.1198/1061860043001
  15. Kim SG, Park JS, and Lee YS (2013). Identification of target clusters by using the restricted normal mixture model, Journal of Applied Statistics, 40, 941-960. https://doi.org/10.1080/02664763.2012.759192
  16. Louzada F, Saraiva EF, Milan LA, and Cobre J (2014). A predictive Bayes factor approach to identify genes differentially expressed: an application to Escherichia coli bacterium data, Brazilian Journal of Probability Statistics, 28, 167-189. https://doi.org/10.1214/12-BJPS200
  17. MacEachern SN (2016). Nonparametric Bayesian methods: a gentle introduction and overview, Communications for Statistical Applications and Methods, 23, 445-466. https://doi.org/10.5351/CSAM.2016.23.6.445
  18. Medvedovic M and Sivaganesan S (2002). Bayesian infinite mixture model based clustering of gene expression profiles. Bioinformatics, 18, 1194-1206. https://doi.org/10.1093/bioinformatics/18.9.1194
  19. Neal RM (1998). Markov chain sampling methods for Dirichlet process mixture models, Technical Report 4915, Retrieved September 1, 2017, from: http://cs.toronto.edu/redford/mixmc.abstract.html
  20. Oh HS and Yang WY (2006). A Bayesian multiple testing of detecting differentially expressed genes in two-sample comparison problem, Communications for Statistical Applications and Methods, 13, 39-47. https://doi.org/10.5351/CKSS.2006.13.1.039
  21. Oh S (2015). How are Bayesian and non-parametric methods doing a great job in RNA-seq differential expression analysis?: a review, Communications for Statistical Applications and Methods, 22, 181-199. https://doi.org/10.5351/CSAM.2015.22.2.181
  22. Parkitna JR, Korostynski M, Kaminska-Chowaniec D, Obara I, Mika J, Przewlocka B, and Przewlocki R (2006). Comparison of gene expression profiles in neuropathic and inflammatory pain, Journal of Physiology and Pharmacology, 57, 401-414.
  23. Pavlidis P (2003). Using ANOVA for gene selection from microarray studies of the nervous system, Methods, 31, 282-289. https://doi.org/10.1016/S1046-2023(03)00157-9
  24. Saraiva EF and Milan LA (2012). Clustering gene expression data using a posterior split-merge-birth procedure, Scandinavian Journal of Statistics, 39, 399-415. https://doi.org/10.1111/j.1467-9469.2011.00765.x
  25. Wu TD (2001). Analyzing gene expression data from DNA microarrays to identify candidate genes, Journal of Pathology, 195, 53-65. https://doi.org/10.1002/1096-9896(200109)195:1<53::AID-PATH891>3.0.CO;2-H
  26. Zollanvari A, Cunningham MJ, Braga-Neto U, and Dougherty ER (2009). Analysis and modeling of time-course gene-expression profiles from nanomaterial-exposed primary human epidermal keratinocytes, BMC Bioinformatics, 10, S10.
  27. Zou F, Huang H, and Ibrahim JG (2010). A semiparametric Bayesian approach for estimating the gene expression distribution, Journal of Biopharmaceutical Statistics, 20, 267-280. https://doi.org/10.1080/10543400903572746