Browse > Article
http://dx.doi.org/10.5392/IJoC.2018.14.1.034

Comparison of Distributed and Parallel NGS Data Analysis Methods based on Cloud Computing  

Kang, Hyungil (Dept. of Semiconductor Electronics Engineering Chungbuk Health & Science University)
Kim, Sangsoo (Dept. of Course-based Qualification Exam Team2 Human Resources Development Service of Korea)
Publication Information
Abstract
With the rapid growth of genomic data, new requirements have emerged that are difficult to handle with big data storage and analysis techniques. Regardless of the size of an organization performing genomic data analysis, it is becoming increasingly difficult for an institution to build a computing environment for storing and analyzing genomic data. Recently, cloud computing has emerged as a computing environment that meets these new requirements. In this paper, we analyze and compare existing distributed and parallel NGS (Next Generation Sequencing) analysis based on cloud computing environment for future research.
Keywords
DNA; Analysis; NGS; Cloud;
Citations & Related Records
연도 인용수 순위
  • Reference
1 M. Choi, "Development Trends of Medical Genomics Using Next Generation Sequencing Techniques," Molecular Cell Biology Newsletter, Apr. 2014.
2 https://www.genome.gov/sequencingcostsdata/
3 M. C. Schatz, B. Langmead, and S. L. Salzberg, "Cloud Computing and the DNA Data Race," Nature Biotechnology, vol. 28, no. 7, 2010, pp. 691-693.   DOI
4 M. Baker, "Next-generation Sequencing: Adjusting to Data Overload," Nature Methods, vol. 7, no. 7, 2010, pp. 495-499.   DOI
5 B. Calabrese and M. Cannataro, "Bioinformatics and Microarray Data Analysis on the Cloud," Methods in Molecular Biology, vol. 1375, 2016, pp. 25-39.
6 http://ngenebio.com/
7 C. Lee, Bioinformatics Analysis of Next-Generation Sequence Data, BRIC View Trend Report, 2016
8 A. Geraldine, V. Auwera, M. O. Carneiro, C. Hartl, R. Poplin, G. Angel, A. Levy-Moonshine, T. Jordan, K. Shakir, D. Roazen, J. Thibault, E. Banks, K. V. Garimella, D. Altshuler, S. Gabriel, and M. A. DePristo, "From FastQ Data to High Confidence Variant Calls: the Genome Analysis Toolkit Best Practices Pipeline," Current Protocols in Bioinformatics, 2013, pp. 11-10.
9 https://www.bioin.or.kr/board.do?cmd=view&bid=tech&num=216321
10 BWA, https://github.com/lh3/bwa
11 GATK, https://software.broadinstitute.org/gatk/
12 B. Langmead, C. Trapnell, M. Pop, and S. Salzberg, "Ultrafast and Memory-efficient Alignment of Short DNA Sequences to the Human Genome," Genome biology, vol. 10, no. 3, 2009.
13 https://hpc.nih.gov/apps/MutSig.html
14 http://broadinstitute.github.io/picard/
15 https://github.com/GregoryFaust/samblaster
16 https://github.com/broadinstitute/mutect
17 https://github.com/ekg/freebayes
18 https://github.com/WGLab/doc-ANNOVAR/
19 https://www.ensembl.org/vep
20 https://gencore.bio.nyu.edu/variant-calling-pipeline/
21 https://wikis.utexas.edu/display/bioiteam/DNAseq+Variant+Calling+Pipeline
22 https://hadoop.apache.org/
23 https://spark.apache.org/
24 D. Decap, J. Reumers, C. Herzeel, P. Costanza, and J. Fostier, "Halvade: Scalable Sequence Analysis with MapReduce," Bioinformatics, vol. 31, no. 15, 2015, pp. 2482-2488.   DOI
25 https://github.com/citiususc/BigBWA
26 https://github.com/citiususc/SparkBWA
27 J. Lee, H. Lee, J. Moon, H. Kang, S. Song, and S. Yu, "Parallel and Distributed PCR Duplication Marking Algorithm Integrated with Genome Sequence Alignment by Using Streaming Technology," Proceedings of TBC 2017, 2017.
28 H. Mushtaq and Z. Al-Ars, "Cluster-based Apache Spark Implementation of the GATK DNA Analysis Pipeline," In Proceedings of IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2015, pp. 1471-1477.
29 H. Mushtaq, F. Liu, C. Costa, G. Liu, P. Hofstee, and Z. Al-Ars, "Sparkga: A Spark Framework for Cost Effective, Fast and Accurate DNA Analysis at Scale," In Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, 2017, pp. 148-157.