DOI QR코드

DOI QR Code

A ChIP-Seq Data Analysis Pipeline Based on Bioconductor Packages

  • Park, Seung-Jin (Personalized Genomic Medicine Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB)) ;
  • Kim, Jong-Hwan (Personalized Genomic Medicine Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB)) ;
  • Yoon, Byung-Ha (Personalized Genomic Medicine Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB)) ;
  • Kim, Seon-Young (Personalized Genomic Medicine Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB))
  • Received : 2017.01.31
  • Accepted : 2017.03.06
  • Published : 2017.03.31

Abstract

Nowadays, huge volumes of chromatin immunoprecipitation-sequencing (ChIP-Seq) data are generated to increase the knowledge on DNA-protein interactions in the cell, and accordingly, many tools have been developed for ChIP-Seq analysis. Here, we provide an example of a streamlined workflow for ChIP-Seq data analysis composed of only four packages in Bioconductor: dada2, QuasR, mosaics, and ChIPseeker. 'dada2' performs trimming of the high-throughput sequencing data. 'QuasR' and 'mosaics' perform quality control and mapping of the input reads to the reference genome and peak calling, respectively. Finally, 'ChIPseeker' performs annotation and visualization of the called peaks. This workflow runs well independently of operating systems (e.g., Windows, Mac, or Linux) and processes the input fastq files into various results in one run. R code is available at github: https://github.com/ddhb/Workflow_of_Chipseq.git.

Keywords

References

  1. Mundade R, Ozer HG, Wei H, Prabhu L, Lu T. Role of ChIP-seq in the discovery of transcription factor binding sites, differential gene regulation mechanism, epigenetic marks and beyond. Cell Cycle 2014;13:2847-2852. https://doi.org/10.4161/15384101.2014.949201
  2. Gentsch GE, Smith JC. Efficient preparation of high-complexity ChIP-Seq profiles from early Xenopus embryos. Methods Mol Biol 2017;1507:23-42.
  3. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods 2012;9:357-359. https://doi.org/10.1038/nmeth.1923
  4. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009;25:1754-1760. https://doi.org/10.1093/bioinformatics/btp324
  5. Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol 2008;9:R137. https://doi.org/10.1186/gb-2008-9-9-r137
  6. Shao Z, Zhang Y, Yuan GC, Orkin SH, Waxman DJ. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets. Genome Biol 2012;13:R16. https://doi.org/10.1186/gb-2012-13-3-r16
  7. Huang W, Loganantharaj R, Schroeder B, Fargo D, Li L. PAVIS: a tool for Peak Annotation and Visualization. Bioinformatics 2013;29:3097-3099. https://doi.org/10.1093/bioinformatics/btt520
  8. Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJ, Holmes SP. DADA2: high-resolution sample inference from Illumina amplicon data. Nat Methods 2016;13:581-583. https://doi.org/10.1038/nmeth.3869
  9. Gaidatzis D, Lerch A, Hahne F, Stadler MB. QuasR: quantification and annotation of short reads in R. Bioinformatics 2015;31:1130-1132. https://doi.org/10.1093/bioinformatics/btu781
  10. Kuan PF, Chung D, Pan G, Thomson JA, Stewart R, Keles S. A statistical framework for the analysis of ChIP-Seq data. J Am Stat Assoc 2011;106:891-903. https://doi.org/10.1198/jasa.2011.ap09706
  11. Yu G, Wang LG, He QY. ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization. Bioinformatics 2015;31:2382-2383. https://doi.org/10.1093/bioinformatics/btv145
  12. ENCODE Project Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 2004;306:636-640. https://doi.org/10.1126/science.1105136
  13. Au KF, Jiang H, Lin L, Xing Y, Wong WH. Detection of splice junctions from paired-end RNA-seq data by SpliceMap. Nucleic Acids Res 2010;38:4570-4578. https://doi.org/10.1093/nar/gkq211
  14. Chung D, Zhang Q, Keles S. MOSAiCS-HMM: a model-based approach for detecting regions of histone modifications from ChIP-Seq data. In: Statistical Analysis of Next Generation Sequencing Data (Datta S, Nettleton D, eds.). New York: Springer, 2014. pp. 277-295.
  15. Huber W, Carey VJ, Gentleman R, Anders S, Carlson M, Carvalho BS, et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods 2015;12:115-121. https://doi.org/10.1038/nmeth.3252

Cited by

  1. SeqAcademy: an educational pipeline for RNA-Seq and ChIP-Seq analysis vol.7, pp.2046-1402, 2018, https://doi.org/10.12688/f1000research.14880.2
  2. Aberrant activation of non-coding RNA targets of transcriptional elongation complexes contributes to TDP-43 toxicity vol.9, pp.1, 2018, https://doi.org/10.1038/s41467-018-06543-0
  3. SeqAcademy: an educational pipeline for RNA-Seq and ChIP-Seq analysis vol.7, pp.2046-1402, 2018, https://doi.org/10.12688/f1000research.14880.1