DOI QR코드

DOI QR Code

Analysis of Whole Transcriptome Sequencing Data: Workflow and Software

  • Yang, In Seok (Severance Biomedical Science Institute, Yonsei University College of Medicine) ;
  • Kim, Sangwoo (Severance Biomedical Science Institute, Yonsei University College of Medicine)
  • Received : 2015.10.13
  • Accepted : 2015.12.12
  • Published : 2015.12.31

Abstract

RNA is a polymeric molecule implicated in various biological processes, such as the coding, decoding, regulation, and expression of genes. Numerous studies have examined RNA features using whole transcriptome sequencing (RNA-seq) approaches. RNA-seq is a powerful technique for characterizing and quantifying the transcriptome and accelerates the development of bioinformatics software. In this review, we introduce routine RNA-seq workflow together with related software, focusing particularly on transcriptome reconstruction and expression quantification.

Keywords

References

  1. Ozsolak F, Milos PM. RNA sequencing: advances, challenges and opportunities. Nat Rev Genet 2011;12:87-98. https://doi.org/10.1038/nrg2934
  2. Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 2008;18:1509-1517. https://doi.org/10.1101/gr.079558.108
  3. Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, et al. Alternative isoform regulation in human tissue transcriptomes. Nature 2008;456:470-476. https://doi.org/10.1038/nature07509
  4. Denoeud F, Aury JM, Da Silva C, Noel B, Rogier O, Delledonne M, et al. Annotating genomes with massive-scale RNA sequencing. Genome Biol 2008;9:R175. https://doi.org/10.1186/gb-2008-9-12-r175
  5. Maher CA, Kumar-Sinha C, Cao X, Kalyana-Sundaram S, Han B, Jing X, et al. Transcriptome sequencing to detect gene fusions in cancer. Nature 2009;458:97-101. https://doi.org/10.1038/nature07638
  6. Adiconis X, Borges-Rivera D, Satija R, DeLuca DS, Busby MA, Berlin AM, et al. Comparative analysis of RNA sequencing methods for degraded or low-input samples. Nat Methods 2013;10:623-629. https://doi.org/10.1038/nmeth.2483
  7. Robasky K, Lewis NE, Church GM. The role of replicates for error mitigation in next-generation sequencing. Nat Rev Genet 2014;15:56-62. https://doi.org/10.1038/nrg3655
  8. Babraham Bioinformatics. Fast QC. Cambridgeshire: Babraham Institute, 2015. Accessed 2015 Nov 2. Available from: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
  9. Yang X, Liu D, Liu F, Wu J, Zou J, Xiao X, et al. HTQC: a fast quality control toolkit for Illumina sequencing data. BMC Bioinformatics 2013;14:33. https://doi.org/10.1186/1471-2105-14-33
  10. FASTX-Toolkit. Cold Spring Harbor: Cold Spring Harbor Laboratory, 2015. Accessed 2015 Nov 2. Available from: http://hannonlab.cshl.edu/fastx_toolkit/.
  11. Dodt M, Roehr JT, Ahmed R, Dieterich C. FLEXBAR-flexiblebarcode and adapter processing for next-generation sequencing platforms. Biology (Basel) 2012;1:895-905.
  12. Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods 2011;8:469-477. https://doi.org/10.1038/nmeth.1613
  13. Li H, Ruan J, Durbin R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 2008;18:1851-1858. https://doi.org/10.1101/gr.078212.108
  14. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009;25:1754-1760. https://doi.org/10.1093/bioinformatics/btp324
  15. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 2009;10:R25. https://doi.org/10.1186/gb-2009-10-3-r25
  16. Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 2009;25:1105-1111. https://doi.org/10.1093/bioinformatics/btp120
  17. Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, et al. MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res 2010;38:e178. https://doi.org/10.1093/nar/gkq622
  18. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 2013;29:15-21. https://doi.org/10.1093/bioinformatics/bts635
  19. Wu TD, Nacu S. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 2010;26:873-881. https://doi.org/10.1093/bioinformatics/btq057
  20. DeLuca DS, Levin JZ, Sivachenko A, Fennell T, Nazaire MD, Williams C, et al. RNA-SeQC: RNA-seq metrics for quality control and process optimization. Bioinformatics 2012;28:1530-1532. https://doi.org/10.1093/bioinformatics/bts196
  21. Wang L, Wang S, Li W. RSeQC: quality control of RNA-seq experiments. Bioinformatics 2012;28:2184-2185. https://doi.org/10.1093/bioinformatics/bts356
  22. Okonechnikov K, Conesa A, Garcia-Alcalde F. Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics 2015 Oct 1 [Epub]. http://dx.doi.org/10.1093/bioinformatics/btv566.
  23. Tarazona S, Furio-Tari P, Turra D, Pietro AD, Nueda MJ, Ferrer A, et al. Data quality aware analysis of differential expression in RNA-seq with NOISeq R/Bioc package. Nucleic Acids Res 2015;43:e140.
  24. Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 2010;28:511-515. https://doi.org/10.1038/nbt.1621
  25. Guttman M, Garber M, Levin JZ, Donaghey J, Robinson J, Adiconis X, et al. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol 2010;28:503-510. https://doi.org/10.1038/nbt.1633
  26. Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, Salzberg SL. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 2015;33:290-295. https://doi.org/10.1038/nbt.3122
  27. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 2011;29:644-652. https://doi.org/10.1038/nbt.1883
  28. Schulz MH, Zerbino DR, Vingron M, Birney E. Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 2012;28:1086-1092. https://doi.org/10.1093/bioinformatics/bts094
  29. Robertson G, Schein J, Chiu R, Corbett R, Field M, Jackman SD, et al. De novo assembly and analysis of RNA-seq data. Nat Methods 2010;7:909-912. https://doi.org/10.1038/nmeth.1517
  30. Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc 2013;8:1494-1512. https://doi.org/10.1038/nprot.2013.084
  31. Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 2012;7:562-578. https://doi.org/10.1038/nprot.2012.016
  32. Griffith M, Griffith OL, Mwenifumbo J, Goya R, Morrissy AS, Morin RD, et al. Alternative expression analysis by RNA sequencing. Nat Methods 2010;7:843-847. https://doi.org/10.1038/nmeth.1503
  33. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 2008;5:621-628. https://doi.org/10.1038/nmeth.1226
  34. Lee S, Seo CH, Lim B, Yang JO, Oh J, Kim M, et al. Accurate quantification of transcriptome from RNA-Seq data by effective length normalization. Nucleic Acids Res 2011;39:e9. https://doi.org/10.1093/nar/gkq1015
  35. Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 2011;12:323. https://doi.org/10.1186/1471-2105-12-323
  36. Patro R, Mount SM, Kingsford C. Sailfish enables alignment-free isoform quantification from RNA-seq reads using lightweight algorithms. Nat Biotechnol 2014;32:462-464. https://doi.org/10.1038/nbt.2862
  37. Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 2013;14:R36. https://doi.org/10.1186/gb-2013-14-4-r36
  38. Zimin AV, Marcais G, Puiu D, Roberts M, Salzberg SL, Yorke JA. The MaSuRCA genome assembler. Bioinformatics 2013;29: 2669-2677. https://doi.org/10.1093/bioinformatics/btt476
  39. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2010;26:139-140. https://doi.org/10.1093/bioinformatics/btp616
  40. Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol 2010;11:R106. https://doi.org/10.1186/gb-2010-11-10-r106
  41. Li J, Tibshirani R. Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. Stat Methods Med Res 2013;22:519-536. https://doi.org/10.1177/0962280211428386
  42. Leng N, Dawson JA, Thomson JA, Ruotti V, Rissman AI, Smits BM, et al. EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics 2013;29:1035-1043. https://doi.org/10.1093/bioinformatics/btt087
  43. Anders S, Pyl PT, Huber W. HTSeq: a Python framework to work with high-throughput sequencing data. Bioinformatics 2015;31:166-169. https://doi.org/10.1093/bioinformatics/btu638
  44. Quinlan AR. BEDTools: The Swiss-Army tool for genome feature analysis. Curr Protoc Bioinformatics 2014;47:11.12.1-11.12.34. https://doi.org/10.1002/0471250953.bi1112s47
  45. Frazee AC, Pertea G, Jaffe AE, Langmead B, Salzberg SL, Leek JT. Ballgown bridges the gap between transcriptome assembly and expression analysis. Nat Biotechnol 2015;33:243-246. https://doi.org/10.1038/nbt.3172
  46. Oshlack A, Wakefield MJ. Transcript length bias in RNA-seq data confounds systems biology. Biol Direct 2009;4:14. https://doi.org/10.1186/1745-6150-4-14
  47. Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, Nkadori E, et al. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 2010;464:768-772. https://doi.org/10.1038/nature08872
  48. Seyednasrollah F, Laiho A, Elo LL. Comparison of software packages for detecting differential expression in RNA-seq studies. Brief Bioinform 2015;16:59-70. https://doi.org/10.1093/bib/bbt086

Cited by

  1. Pathways of aging: comparative analysis of gene signatures in replicative senescence and stress induced premature senescence vol.17, pp.S14, 2016, https://doi.org/10.1186/s12864-016-3352-4
  2. Retinoic acid receptor-α regulates synthetic events in human platelets vol.15, pp.12, 2017, https://doi.org/10.1111/jth.13861
  3. Transcriptomic analysis of differential gene expression reveals an increase in COX2 levels during in vitro canine herpesvirus infection vol.48, pp.10, 2018, https://doi.org/10.1590/0103-8478cr20170945