Browse > Article
http://dx.doi.org/10.5352/JLS.2021.31.4.410

ChIP-seq Library Preparation and NGS Data Analysis Using the Galaxy Platform  

Kang, Yujin (Department of Molecular Biology, College of Natural Sciences, Pusan National University)
Kang, Jin (Department of Molecular Biology, College of Natural Sciences, Pusan National University)
Kim, Yea Woon (Department of Molecular Biology, College of Natural Sciences, Pusan National University)
Kim, AeRi (Department of Molecular Biology, College of Natural Sciences, Pusan National University)
Publication Information
Journal of Life Science / v.31, no.4, 2021 , pp. 410-417 More about this Journal
Abstract
Next-generation sequencing (NGS) is a high-throughput technique for sequencing large numbers of DNA fragments that are prepared from a genome. This sequencing technique has been used to elucidate whole genome sequences of living organisms and to analyze complementary DNA (cDNA) or chromatin immunoprecipitated DNA (ChIPed DNA) at the genome level. After NGS, the use of proper tools is important for processing and analyzing data with reasonable parameters. However, handling large-scale sequencing data and programing for data analysis can be difficult. The Galaxy platform, a public web service system, provides many different tools for NGS data analysis, and it allows researchers to analyze their data on a web browser with no deep knowledge about bioinformatics and/or programing. In this study, we explain the procedure for preparing chromatin immunoprecipitation-sequencing (ChIP-seq) libraries and steps for analyzing ChIP-seq data using the Galaxy platform. The data analysis steps include the NGS data upload to Galaxy, quality check of the NGS data, premapping processes, read mapping, the post-mapping process, peak-calling and visualization by window view, heatmaps, average profile, and correlation analysis. Analysis of our histone H3K4me1 ChIP-seq data in K562 cells shows that it correlates with public data. Thus, NGS data analysis using the Galaxy platform can provide an easy approach to bioinformatics.
Keywords
Bioinformatics; ChIP-seq; galaxy; NGS;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Goecks, J., Nekrutenko, A. and Taylor, J. 2010. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11, R86.
2 Pareek, C. S., Smoczynski, R. and Tretyn, A. 2011. Sequencing technologies and genome sequencing. J. Appl. Genet. 52, 413-435.   DOI
3 Zhang, Y., Liu, T., Meyer, C. A., Eeckhoute, J., Johnson, D. S., Bernstein, B. E., Nusbaum, C., Myers, R. M., Brown, M., Li, W. and Liu, X. S. 2008. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137.
4 Celesti, A., Fazio, M., Celesti, F., Sannino, G., Campo, S. and Villari, M. 2016. 2016 IEEE Symposium on Computers and Communication (ISCC), pp. 267-270.
5 Afgan, E., Baker, D., Batut, B., van den Beek, M., Bouvier, D., Cech, M., Chilton, J., Clements, D., Coraor, N., Gruning, B. A., Guerler, A., Hillman-Jackson, J., Hiltemann, S., Jalili, V., Rasche, H., Soranzo, N., Goecks, J., Taylor, J., Nekrutenko, A. and Blankenberg, D. 2018. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res. 46, W537-544.   DOI
6 Kim, Y. W., Kang, Y., Kang, J. and Kim, A. 2020. GATA-1-dependent histone H3K27 acetylation mediates erythroid cell-specific chromatin interaction between CTCF sites. FASEB J. 34, 14736-14749.
7 Afgan, E., Baker, D., van den Beek, M., Blankenberg, D., Bouvier, D., Cech, M., Chilton, J., Clements, D., Coraor, N., Eberhard, C., Gruning, B., Guerler, A., Hillman-Jackson, J., Von Kuster, G., Rasche, E., Soranzo, N., Turaga, N., Taylor, J., Nekrutenko, A. and Goecks, J. 2016. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res. 44, W3-10.   DOI
8 Andrews, S.n.d. FastQC A Quality Control tool for High Throughput Sequence Data. Retrieved from http://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
9 Blankenberg, D., Gordon, A., Von Kuster, G., Coraor, N., Taylor, J., Nekrutenko, A. and Galaxy, T. 2010. Manipulation of FASTQ data with Galaxy. Bioinformatics (Oxford, England) 26, 1783-1785.   DOI
10 Bolger, A. M., Lohse, M. and Usadel, B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114-2120.   DOI
11 Feng, J., Liu, T., Qin, B., Zhang, Y. and Liu, X. S. 2012. Identifying ChIP-seq enrichment using MACS. Nat. Protoc. 7, 1728-1740.   DOI
12 Giardine, B., Riemer, C., Hardison, R. C., Burhans, R., Elnitski, L., Shah, P., Zhang, Y., Blankenberg, D., Albert, I., Taylor, J., Miller, W., Kent, W. J. and Nekrutenko, A. 2005. Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 15, 1451-1455.   DOI
13 Gordon, A. 2010. FASTQ/A short-reads pre-processing tools. Retrieved from http://hannonlab.cshl.edu/fastx_toolkit/.
14 Li, H. 2011. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987-2993.   DOI
15 Blankenberg, D., Von Kuster, G., Coraor, N., Ananda, G., Lazarus, R., Mangan, M., Nekrutenko, A. and Taylor, J. 2010. Galaxy: a web-based genome analysis tool for experimentalists. Curr. Protoc. Mol. Biol. Chapter 19, Unit 19.10.11-21.
16 Langmead, B., Trapnell, C., Pop, M. and Salzberg, S. L. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25.   DOI
17 Langmead, B. and Salzberg, S. L. 2012. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357-359.   DOI
18 Langmead, B. and Nellore, A. 2018. Cloud computing for genomic data analysis and collaboration. Nat. Rev. Genet. 19, 208-219.   DOI
19 Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G. and Durbin, R. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078-2079.   DOI
20 Li, H. 2011. Improving SNP discovery by base alignment quality. Bioinformatics 27, 1157-1158.
21 Ramirez, F., Ryan, D. P., Gruning, B., Bhardwaj, V., Kilpert, F., Richter, A. S., Heyne, S., Dundar, F. and Manke, T. 2016. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160-165.   DOI
22 van Dijk, E. L., Auger, H., Jaszczyszyn, Y. and Thermes, C. 2014. Ten years of next-generation sequencing technology. Trends Genet. 30, 418-426.   DOI