DOI QR코드

DOI QR Code

Beyond gene expression level: How are Bayesian methods doing a great job in quantification of isoform diversity and allelic imbalance?

  • Oh, Sunghee (Department of Computer Science and Statistics, Jeju National University) ;
  • Kim, Chul Soo (Department of Computer Science and Statistics, Jeju National University)
  • Received : 2015.09.14
  • Accepted : 2015.11.29
  • Published : 2016.01.31

Abstract

Thanks to recent advance of next generation sequencing techniques, RNA-seq enabled to have an unprecedented opportunity to identify transcript variants with isoform diversity and allelic imbalance (Anders et al., 2012) by different transcriptional rates. To date, it is well known that those features might be associated with the aberrant patterns of disease complexity such as tissue (Anders and Huber, 2010; Anders et al., 2012; Nariai et al., 2014) specific differential expression at isoform levels or tissue specific allelic imbalance in mal-functionality of disease processes, etc. Nevertheless, the knowledge of post-transcriptional modification and AI in transcriptomic and genomic areas has been little known in the traditional platforms due to the limitation of technology and insufficient resolution. We here stress the potential of isoform variability and allelic specific expression that are relevant to the abnormality of disease mechanisms in transcriptional genetic regulatory networks. In addition, we systematically review how robust Bayesian approaches in RNA-seq have been developed and utilized in this regard in the field.

Keywords

References

  1. Anders, S. and Huber, W. (2010). Differential expression analysis for sequence count data. Genome Biology, 11, R106. https://doi.org/10.1186/gb-2010-11-10-r106
  2. Anders, S., McCarthy, D. J., Chen, Y., Okoniewski, M., Smyth, G. K., Huber, W. and Robinson, M. D. (2013). Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Nature Protocols, 8, 1765-1786. https://doi.org/10.1038/nprot.2013.099
  3. Anders, S., Reyes, A. and Huber, W. (2012). Detecting differential usage of exons from RNA-seq data. Genome Research, 22, 2008-2017. https://doi.org/10.1101/gr.133744.111
  4. Aryee, M. J., Gutierrez-Pabello, J. A., Kramnik, I., Maiti, T. and Quackenbush, J. (2009). An improved empirical bayes approach to estimating differential gene expression in microarray time-course data: BETR (Bayesian Estimation of Temporal Regulation). BMC Bioinformatics, 10, 409. https://doi.org/10.1186/1471-2105-10-409
  5. Bar-Joseph, Z., Gitter, A. and Simon, I. (2012). Studying and modelling dynamic biological processes using time-series gene expression data. Nature Reviews. Genetics, 13, 552-564.
  6. Beretta, S., Bonizzoni, P., Vedova, G. D., Pirola, Y. and Rizzi, R. (2014). Modeling alternative splicing variants from RNA-Seq data with isoform graphs. Journal of Computational Biology : A Journal of Computational Molecular Cell Biology, 21, 16-40. https://doi.org/10.1089/cmb.2013.0112
  7. Bernard, E., Jacob, L., Mairal, J. and Vert, J. P. (2014). Efficient RNA isoform identification and quantification from RNA-Seq data with network flows. Bioinformatics, 30, 2447-2455. https://doi.org/10.1093/bioinformatics/btu317
  8. Bi, Y. and Davuluri, R. V. (2013). NPEBseq: nonparametric empirical bayesian-based procedure for differential expression analysis of RNA-seq data. BMC Bioinformatics, 14, 262. https://doi.org/10.1186/1471-2105-14-262
  9. Bullard, J. H., Purdom, E., Hansen, K. D. and Dudoit, S. (2010). Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics, 11, 94. https://doi.org/10.1186/1471-2105-11-94
  10. Chan, S. L., Pedersen, W. A., Zhu, H. and Mattson, M. P. (2002). Numb modifies neuronal vulnerability to amyloid beta-peptide in an isoform-specific manner by a mechanism involving altered calcium homeostasis: Implications for neuronal death in Alzheimer's disease. Neuromolecular Medicine, 1, 55-67. https://doi.org/10.1385/NMM:1:1:55
  11. Cumbie, J. S., Kimbrel, J. A., Di, Y., Schafer, D. W., Wilhelm, L. J., Fox, S. E., Sullivan, C. M., Curzon, A. D., Carrington, J. C., Mockler, T.C., et al. (2011). GENE-counter: A computational pipeline for the analysis of RNA-Seq data for gene expression differences. PloS One, 6, e25279. https://doi.org/10.1371/journal.pone.0025279
  12. Deng, N., Puetter, A., Zhang, K., Johnson, K., Zhao, Z., Taylor, C., Flemington, E.K. and Zhu, D. (2011). Isoform-level microRNA-155 target prediction using RNA-seq. Nucleic Acids Research, 39, e61. https://doi.org/10.1093/nar/gkr042
  13. Gao, X. and Song, P.X. (2005). Nonparametric tests for differential gene expression and interaction effects in multi-factorial microarray experiments. BMC Bioinformatics, 6, 186. https://doi.org/10.1186/1471-2105-6-186
  14. Gerns Storey, H. L., Richardson, B. A., Singa, B., Naulikha, J., Prindle, V. C., Diaz-Ochoa, V. E., Felgner, P.L., Camerini, D., Horton, H., John-Stewart, G., et al. (2014). Use of principal components analysis and protein microarray to explore the association of HIV-1-specific IgG responses with disease progression. AIDS Research and Human Retroviruses, 30, 37-44. https://doi.org/10.1089/aid.2013.0088
  15. Ginsberg, S. D., Alldred, M. J., Counts, S. E., Cataldo, A. M., Neve, R.L., Jiang, Y., Wuu, J., Chao, M. V., Mufson, E. J., Nixon, R. A., et al. (2010). Microarray analysis of hippocampal CA1 neurons implicates early endosomal dysfunction during Alzheimer's disease progression. Biological Psychiatry, 68, 885-893. https://doi.org/10.1016/j.biopsych.2010.05.030
  16. Han, H. and Jiang, X. (2014). Disease Biomarker Query from RNA-Seq Data. Cancer Informatics, 13, 81-94.
  17. Hardcastle, T. J. and Kelly, K. A. (2010). baySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics, 11, 422. https://doi.org/10.1186/1471-2105-11-422
  18. Hiller, D., Jiang, H., Xu, W. and Wong, W. H. (2009). Identifiability of isoform deconvolution from junction arrays and RNA-Seq. Bioinformatics, 25, 3056-3059. https://doi.org/10.1093/bioinformatics/btp544
  19. Hiller, D. and Wong, W. H. (2013). Simultaneous isoform discovery and quantification from RNA-seq. Statistics in Biosciences, 5, 100-118. https://doi.org/10.1007/s12561-012-9069-2
  20. Howard, B.E. and Heber, S. (2010). Towards reliable isoform quantification using RNA-SEQ data. BMC Bioinformatics, 11, S6.
  21. Hu, Y., Liu, Y., Mao, X., Jia, C., Ferguson, J. F., Xue, C., Reilly, M. P., Li, H. and Li, M. (2014). PennSeq: Accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution. Nucleic Acids Research, 42, e20. https://doi.org/10.1093/nar/gkt1304
  22. Jiang, H. andWong, W. H. (2009). Statistical inferences for isoform expression in RNA-Seq. Bioinformatics, 25, 1026-1032. https://doi.org/10.1093/bioinformatics/btp113
  23. Katz, Y., Wang, E. T., Airoldi, E. M. and Burge, C. B. (2010). Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nature Methods, 7, 1009-1015. https://doi.org/10.1038/nmeth.1528
  24. Kaur, H., Mao, S., Li, Q., Sameni, M., Krawetz, S. A., Sloane, B. F. and Mattingly, R.R. (2012). RNA-Seq of human breast ductal carcinoma in situ models reveals aldehyde dehydrogenase isoform 5A1 as a novel potential target. PloS One, 7, e50249. https://doi.org/10.1371/journal.pone.0050249
  25. Kim, K. H., Moon, M., Yu, S. B., Mook-Jung, I. and Kim, J. I. (2012). RNA-Seq analysis of frontal cortex and cerebellum from 5XFAD mice at early stage of disease pathology. Journal of Alzheimer's Disease, 29, 793-808. https://doi.org/10.3233/JAD-2012-111793
  26. Kimes, P. K., Cabanski, C.R., Wilkerson, M. D., Zhao, N., Johnson, A. R., Perou, C. M., Makowski, L., Maher, C. A., Liu, Y., Marron, J. S., et al. (2014). SigFuge: Single gene clustering of RNA-seq reveals differential isoform usage among cancer samples. Nucleic Acids Research, 42, e113. https://doi.org/10.1093/nar/gku521
  27. Kumar, R., Lawrence, M. L., Watt, J., Cooksey, A. M., Burgess, S. C. and Nanduri, B. (2012). RNA-seq based transcriptional map of bovine respiratory disease pathogen Histophilus somni 2336. PloS One, 7, e29435. https://doi.org/10.1371/journal.pone.0029435
  28. Lee, J., Ji, Y., Liang, S., Cai, G. and Muller, P. (2011). On differential gene expression using RNA-Seq data. Cancer Informatics, 10, 205-215.
  29. Leon-Novelo, L.G., McIntyre, L.M., Fear, J.M. and Graze, R.M. (2014). A flexible Bayesian method for detecting allelic imbalance in RNA-seq data. BMC Genomics, 15, 920. https://doi.org/10.1186/1471-2164-15-920
  30. Lerch, J. K., Kuo, F., Motti, D., Morris, R., Bixby, J. L. and Lemmon, V. P. (2012). Isoform diversity and regulation in peripheral and central neurons revealed through RNA-Seq. PloS One, 7, e30417. https://doi.org/10.1371/journal.pone.0030417
  31. Li, B. and Dewey, C. N. (2011). RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics, 12, 323. https://doi.org/10.1186/1471-2105-12-323
  32. Li, B., Ruotti, V., Stewart, R. M., Thomson, J. A. and Dewey, C. N. (2010). RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics, 26, 493-500. https://doi.org/10.1093/bioinformatics/btp692
  33. Li, B., Tsoi, L. C., Swindell, W. R., Gudjonsson, J. E., Tejasvi, T., Johnston, A., Ding, J., Stuart, P.E., Xing, X., Kochkodan, J.J., et al. (2014). Transcriptome analysis of psoriasis in a large case-control sample: RNA-seq provides insights into disease mechanisms. The Journal of Investigative Dermatology, 134, 1828-1838. https://doi.org/10.1038/jid.2014.28
  34. Li, J. J., Jiang, C. R., Brown, J. B., Huang, H. and Bickel, P. J. (2011). Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation. Proceedings of the National Academy of Sciences of the United States of America, 108, 19867-19872.
  35. Li, W. and Jiang, T. (2012). Transcriptome assembly and isoform expression level estimation from biased RNA-Seq reads. Bioinformatics, 28, 2914-2921. https://doi.org/10.1093/bioinformatics/bts559
  36. Li, Y.M. and Dickson, D. W. (1997). Enhanced binding of advanced glycation endproducts (AGE) by the ApoE4 isoform links the mechanism of plaque deposition in Alzheimer's disease. Neuroscience Letters, 226, 155-158. https://doi.org/10.1016/S0304-3940(97)00266-8
  37. Lin, Y., Reynolds, P. and Feingold, E. (2003). An empirical bayesian method for differential expression studies using one-channel microarray data. Statistical applications in genetics and molecular biology, 2, Article8.
  38. Ma, X. and Zhang, X. (2013). NURD: an implementation of a new method to estimate isoform expression from non-uniform RNA-seq data. BMC Bioinformatics, 14, 220. https://doi.org/10.1186/1471-2105-14-220
  39. Marioni, J. C., Mason, C. E., Mane, S. M., Stephens, M. and Gilad, Y. (2008). RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Research, 18, 1509-1517. https://doi.org/10.1101/gr.079558.108
  40. Mezlini, A. M., Smith, E. J., Fiume, M., Buske, O., Savich, G. L., Shah, S., Aparicio, S., Chiang, D. Y., Goldenberg, A. and Brudno, M. (2013). iReckon: simultaneous isoform discovery and abundance estimation from RNA-seq data. Genome Research, 23, 519-529. https://doi.org/10.1101/gr.142232.112
  41. Mills, J. D., Nalpathamkalam, T., Jacobs, H.I., Janitz, C., Merico, D., Hu, P. and Janitz, M. (2013). RNA-Seq analysis of the parietal cortex in Alzheimer's disease reveals alternatively spliced isoforms related to lipid metabolism. Neuroscience Letters, 536, 90-95. https://doi.org/10.1016/j.neulet.2012.12.042
  42. Nariai, N., Hirose, O., Kojima, K. and Nagasaki, M. (2013). TIGAR: transcript isoform abundance estimation method with gapped alignment of RNA-Seq data by variational Bayesian inference. Bioinformatics, 29, 2292-2299. https://doi.org/10.1093/bioinformatics/btt381
  43. Nariai, N., Kojima, K., Mimori, T., Sato, Y., Kawai, Y., Yamaguchi-Kabata, Y. and Nagasaki, M. (2014). TIGAR2: sensitive and accurate estimation of transcript isoform expression with longer RNA-Seq reads. BMC Genomics, 15, Suppl 10, S5.
  44. Ng, D. W., Shi, X., Nah, G. and Chen, Z. J. (2014). High-throughput RNA-seq for allelic or locus-specific expression analysis in Arabidopsis-related species, hybrids and allotetraploids. Methods in Molecular Biology, 1112, 33-48. https://doi.org/10.1007/978-1-62703-773-0_3
  45. Nicolae, M., Mangul, S., Mandoiu, II and Zelikovsky, A. (2011). Estimation of alternative splicing isoform frequencies from RNA-Seq data. Algorithms for Molecular Biology, 6, 9. https://doi.org/10.1186/1748-7188-6-9
  46. Nishiu, M., Yanagawa, R., Nakatsuka, S., Yao, M., Tsunoda, T., Nakamura, Y. and Aozasa, K. (2002). Microarray analysis of gene-expression profiles in diffuse large B-cell lymphoma: Identification of genes related to disease progression. Japanese Journal of Cancer Research : Gann, 93, 894-901. https://doi.org/10.1111/j.1349-7006.2002.tb01335.x
  47. Niu, L., Huang, W., Umbach, D. M. and Li, L. (2014). IUTA: A tool for effectively detecting differential isoform usage from RNA-Seq data. BMC Genomics, 15, 862. https://doi.org/10.1186/1471-2164-15-862
  48. Oh, S., Song, S., Grabowski, G., Zhao, H. and Noonan, J. P. (2013). Time series expression analyses using RNA-seq: A statistical approach, BioMed Research International 2013, 203681.
  49. Oshlack, A., Robinson, M. D. and Young, M. D. (2010). From RNA-seq reads to differential expression results. Genome Biology, 11, 220. https://doi.org/10.1186/gb-2010-11-12-220
  50. Pandey, R.V., Franssen, S.U., Futschik, A. and Schlotterer, C. (2013). Allelic imbalance metre (Allim), a new tool for measuring allele-specific gene expression with RNA-seq data. Molecular Ecology Resources, 13, 740-745. https://doi.org/10.1111/1755-0998.12110
  51. Patro, R., Mount, S. M. and Kingsford, C. (2014). Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms. Nature Biotechnology, 32, 462-464. https://doi.org/10.1038/nbt.2862
  52. Pollier, J., Rombauts, S. and Goossens, A. (2013). Analysis of RNA-Seq data with TopHat and Cuinks for genome-wide expression analysis of jasmonate-treated plants and plant cultures. Methods in Molecular Biology, 1011, 305-315. https://doi.org/10.1007/978-1-62703-414-2_24
  53. Rehrauer, H., Opitz, L., Tan, G., Sieverling, L. and Schlapbach, R. (2013). Blind spots of quantitative RNA-seq: The limits for assessing abundance, differential expression and isoform switching. BMC Bioinformatics, 14, 370. https://doi.org/10.1186/1471-2105-14-370
  54. Robakis, N. K. and Georgakopoulos, A. (2014). Allelic interference: a mechanism for trans-dominant trans-mission of loss of function in the neurodegeneration of familial Alzheimer's disease. Neurodegenerative Diseases, 13, 126-130.
  55. Roberts, A., Trapnell, C., Donaghey, J., Rinn, J. L. and Pachter, L. (2011). Improving RNA-Seq expression estimates by correcting for fragment bias. Genome Biology, 12, R22. https://doi.org/10.1186/gb-2011-12-3-r22
  56. Robinson, M. D., McCarthy, D. J. and Smyth, G.K. (2010). edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26, 139-140. https://doi.org/10.1093/bioinformatics/btp616
  57. Robinson, M. D. and Oshlack, A. (2010). A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology, 11, R25. https://doi.org/10.1186/gb-2010-11-3-r25
  58. Safikhani, Z., Sadeghi, M., Pezeshk, H. and Eslahchi, C. (2013). SSP: An interval integer linear programming for de novo transcriptome assembly and isoform discovery of RNA-seq reads. Genomics, 102, 507-514. https://doi.org/10.1016/j.ygeno.2013.10.003
  59. Satoh, J., Yamamoto, Y., Asahina, N., Kitano, S. and Kino, Y. (2014). RNA-Seq data mining: Downregulation of NeuroD6 serves as a possible biomarker for alzheimer's disease brains. Disease Markers 2014, 123165.
  60. Shen, S., Park, J. W., Huang, J., Dittmar, K. A., Lu, Z. X., Zhou, Q., Carstens, R. P. and Xing, Y. (2012). MATS: a Bayesian framework for flexible detection of differential alternative splicing from RNA-Seq data. Nucleic Acids Research, 40, e61. https://doi.org/10.1093/nar/gkr1291
  61. Shi, Y. and Jiang, H. (2013). rSeqDiff: Detecting differential isoform expression from RNA-Seq data using hierarchical likelihood ratio test. PloS One, 8, e79448. https://doi.org/10.1371/journal.pone.0079448
  62. Skelly, D. A., Johansson, M., Madeoy, J., Wakefield, J. and Akey, J. M. (2011). A powerful and flexible statistical framework for testing hypotheses of allele-specific gene expression from RNA-seq data. Genome Research, 21, 1728-1737. https://doi.org/10.1101/gr.119784.110
  63. Stegle, O., Denby, K.J., Cooke, E. J., Wild, D. L., Ghahramani, Z. and Borgwardt, K.M. (2010). A robust Bayesian two-sample test for detecting intervals of differential gene expression in microarray time series. Journal of Computational Biology, 17, 355-367. https://doi.org/10.1089/cmb.2009.0175
  64. Suo, C., Calza, S., Salim, A. and Pawitan, Y. (2014). Joint estimation of isoform expression and isoform-specific read distribution using multisample RNA-Seq data. Bioinformatics, 30, 506-513. https://doi.org/10.1093/bioinformatics/btt704
  65. Tarazona, S., Garcia-Alcalde, F., Dopazo, J., Ferrer, A. and Conesa, A. (2011). Differential expression in RNA-seq: A matter of depth. Genome Research, 21, 2213-2223. https://doi.org/10.1101/gr.124321.111
  66. Trapnell, C., Pachter, L. and Salzberg, S. L. (2009). TopHat: Discovering splice junctions with RNA-Seq. Bioinformatics, 25, 1105-1111. https://doi.org/10.1093/bioinformatics/btp120
  67. Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D. R., Pimentel, H., Salzberg, S. L., Rinn, J. L. and Pachter, L. (2012). Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cuinks. Nature Protocols, 7, 562-578. https://doi.org/10.1038/nprot.2012.016
  68. Trapnell, C., Williams, B. A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M. J., Salzberg, S. L., Wold, B. J. and Pachter, L. (2010). Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology, 28, 511-515. https://doi.org/10.1038/nbt.1621
  69. Vardhanabhuti, S., Li, M. and Li, H. (2013). A Hierarchical Bayesian Model for Estimating and Inferring Differential Isoform Expression for Multi-Sample RNA-Seq Data. Statistics in Biosciences, 5, 119-137. https://doi.org/10.1007/s12561-011-9052-3
  70. Wang, R., Sun, L., Bao, L., Zhang, J., Jiang, Y., Yao, J., Song, L., Feng, J., Liu, S. and Liu, Z. (2013). Bulk segregant RNA-seq reveals expression and positional candidate genes and allele-specific expression for disease resistance against enteric septicemia of catfish. BMC Genomics, 14, 929. https://doi.org/10.1186/1471-2164-14-929
  71. Wang, X.,Wu, Z. and Zhang, X. (2010). Isoform abundance inference provides a more accurate estimation of gene expression levels in RNA-seq. Journal of Bioinformatics and Computational Biology, 8, 177-192. https://doi.org/10.1142/S0219720010005178
  72. Wang, Y., Lupiani, B., Reddy, S. M., Lamont, S. J. and Zhou, H. (2014). RNA-seq analysis revealed novel genes and signaling pathway associated with disease resistance to avian influenza virus infection in chickens. Poultry Science, 93, 485-493. https://doi.org/10.3382/ps.2013-03557
  73. Wu, J., Akerman, M., Sun, S., McCombie, W. R., Krainer, A. R. and Zhang, M. Q. (2011a). SpliceTrap: A method to quantify alternative splicing under single cellular conditions. Bioinformatics, 27, 3010-3016. https://doi.org/10.1093/bioinformatics/btr508
  74. Wu, Z., Wang, X. and Zhang, X. (2011b). Using non-uniform read distribution models to improve isoform expression inference in RNA-Seq. Bioinformatics, 27, 502-508. https://doi.org/10.1093/bioinformatics/btq696
  75. Yalamanchili, H. K., Li, Z., Wang, P., Wong, M. P., Yao, J. and Wang, J. (2014). SpliceNet: recovering splicing isoform-specific differential gene networks from RNA-Seq data of normal and diseased samples. Nucleic Acids Research, 42, e121. https://doi.org/10.1093/nar/gku577
  76. Zhang, J., Kuo, C. C. and Chen, L. (2014). WemIQ: An accurate and robust isoform quantification method for RNA-seq data. Bioinformatics, doi:10.1093/bioinformatics/btu757.
  77. Zhao, H., Chan, K. L., Cheng, L. M. and Yan, H. (2008). Multivariate hierarchical Bayesian model for differential gene expression analysis in microarray experiments. BMC Bioinformatics, 9, S9.
  78. Zheng, S. and Chen, L. (2009). A hierarchical Bayesian model for comparing transcriptomes at the individual transcript isoform level. Nucleic Acids Research, 37, e75. https://doi.org/10.1093/nar/gkp282