[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.5392/IJoC.2018.14.1.034

Comparison of Distributed and Parallel NGS Data Analysis Methods based on Cloud Computing

Kang, Hyungil (Dept. of Semiconductor Electronics Engineering Chungbuk Health & Science University)
Kim, Sangsoo (Dept. of Course-based Qualification Exam Team2 Human Resources Development Service of Korea)

Publication Information

International Journal of Contents / v.14, no.1, 2018 , pp. 34-38 More about this Journal

Abstract

With the rapid growth of genomic data, new requirements have emerged that are difficult to handle with big data storage and analysis techniques. Regardless of the size of an organization performing genomic data analysis, it is becoming increasingly difficult for an institution to build a computing environment for storing and analyzing genomic data. Recently, cloud computing has emerged as a computing environment that meets these new requirements. In this paper, we analyze and compare existing distributed and parallel NGS (Next Generation Sequencing) analysis based on cloud computing environment for future research.

Keywords

DNA; Analysis; NGS; Cloud;

Citations & Related Records

Reference

1	M. Choi, "Development Trends of Medical Genomics Using Next Generation Sequencing Techniques," Molecular Cell Biology Newsletter, Apr. 2014.
2	https://www.genome.gov/sequencingcostsdata/
3	M. C. Schatz, B. Langmead, and S. L. Salzberg, "Cloud Computing and the DNA Data Race," Nature Biotechnology, vol. 28, no. 7, 2010, pp. 691-693. DOI
4	M. Baker, "Next-generation Sequencing: Adjusting to Data Overload," Nature Methods, vol. 7, no. 7, 2010, pp. 495-499. DOI
5	B. Calabrese and M. Cannataro, "Bioinformatics and Microarray Data Analysis on the Cloud," Methods in Molecular Biology, vol. 1375, 2016, pp. 25-39.
6	http://ngenebio.com/
7	C. Lee, Bioinformatics Analysis of Next-Generation Sequence Data, BRIC View Trend Report, 2016
8	A. Geraldine, V. Auwera, M. O. Carneiro, C. Hartl, R. Poplin, G. Angel, A. Levy-Moonshine, T. Jordan, K. Shakir, D. Roazen, J. Thibault, E. Banks, K. V. Garimella, D. Altshuler, S. Gabriel, and M. A. DePristo, "From FastQ Data to High Confidence Variant Calls: the Genome Analysis Toolkit Best Practices Pipeline," Current Protocols in Bioinformatics, 2013, pp. 11-10.
9	https://www.bioin.or.kr/board.do?cmd=view&bid=tech&num=216321
10	BWA, https://github.com/lh3/bwa
11	GATK, https://software.broadinstitute.org/gatk/
12	B. Langmead, C. Trapnell, M. Pop, and S. Salzberg, "Ultrafast and Memory-efficient Alignment of Short DNA Sequences to the Human Genome," Genome biology, vol. 10, no. 3, 2009.
13	https://hpc.nih.gov/apps/MutSig.html
14	http://broadinstitute.github.io/picard/
15	https://github.com/GregoryFaust/samblaster
16	https://github.com/broadinstitute/mutect
17	https://github.com/ekg/freebayes
18	https://github.com/WGLab/doc-ANNOVAR/
19	https://www.ensembl.org/vep
20	https://gencore.bio.nyu.edu/variant-calling-pipeline/
21	https://wikis.utexas.edu/display/bioiteam/DNAseq+Variant+Calling+Pipeline
22	https://hadoop.apache.org/
23	https://spark.apache.org/
24	D. Decap, J. Reumers, C. Herzeel, P. Costanza, and J. Fostier, "Halvade: Scalable Sequence Analysis with MapReduce," Bioinformatics, vol. 31, no. 15, 2015, pp. 2482-2488. DOI
25	https://github.com/citiususc/BigBWA
26	https://github.com/citiususc/SparkBWA
27	J. Lee, H. Lee, J. Moon, H. Kang, S. Song, and S. Yu, "Parallel and Distributed PCR Duplication Marking Algorithm Integrated with Genome Sequence Alignment by Using Streaming Technology," Proceedings of TBC 2017, 2017.
28	H. Mushtaq and Z. Al-Ars, "Cluster-based Apache Spark Implementation of the GATK DNA Analysis Pipeline," In Proceedings of IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2015, pp. 1471-1477.
29	H. Mushtaq, F. Liu, C. Costa, G. Liu, P. Hofstee, and Z. Al-Ars, "Sparkga: A Spark Framework for Cost Effective, Fast and Accurate DNA Analysis at Scale," In Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, 2017, pp. 148-157.