Browse > Article
http://dx.doi.org/10.5352/JLS.2015.25.3.357

Next Generation Sequencing and Bioinformatics  

Kim, Ki-Bong (Department of Biomedical Technology, Sangmyung University)
Publication Information
Journal of Life Science / v.25, no.3, 2015 , pp. 357-367 More about this Journal
Abstract
With the ongoing development of next-generation sequencing (NGS) platforms and advancements in the latest bioinformatics tools at an unprecedented pace, the ultimate goal of sequencing the human genome for less than $1,000 can be feasible in the near future. The rapid technological advances in NGS have brought about increasing demands for statistical methods and bioinformatics tools for the analysis and management of NGS data. Even in the early stages of the commercial availability of NGS platforms, a large number of applications or tools already existed for analyzing, interpreting, and visualizing NGS data. However, the availability of this plethora of NGS data presents a significant challenge for storage, analyses, and data management. Intrinsically, the analysis of NGS data includes the alignment of sequence reads to a reference, base-calling, and/or polymorphism detection, de novo assembly from paired or unpaired reads, structural variant detection, and genome browsing. While the NGS technologies have allowed a massive increase in available raw sequence data, a number of new informatics challenges and difficulties must be addressed to improve the current state and fulfill the promise of genome research. This review aims to provide an overview of major NGS technologies and bioinformatics tools for NGS data analyses.
Keywords
Base-calling; bioinformatics tools; de novo assembly; next generation sequencing; polymorphism detection;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Schadt, E. E., Turner, S. and Kasarskis, A. 2010. A window into third generation sequencing. Hum. Mol. Genet. 19, R227- R240.   DOI
2 Scholz, M. B., Lo, C. and Chain, P. 2012. Next generation sequencing and bioinformatics bottlenecks: the current state of metagenomics data analysis. Curr. Opin. Biotechnol. 23, 9-15.   DOI   ScienceOn
3 Shendure, J., Porreca, G. J., Reppas, N. B., Lin, X., Mc-Cutcheon, J. P., Rosenbaum, A. M., Wang, M. D., Zhang, K., Mitra, R. D. and Church, G. M. 2005. Accurate multiplex polony sequencing of an evolved bacterial genome. Science 309, 1728-1732.   DOI
4 Tawfik, D. S and Griffiths, A. D. 1998. Man-made cell-like compartments for molecular evolution. Nature Biotech. 16, 652-656.   DOI
5 Park, P. J. 2009. ChIP-seq: advantages and challenges of a maturing technology. Nat. Rev. Genet. 10, 669-680.
6 Turcatti, G., Romieu, A., Fedurco, M. and Tairi, A. P. 2008. A new class of cleavable fluorescent nucleotides: synthesis and optimization as reversible terminators for DNA sequencing by synthesis. Nucleic Acids Res. 36, e25.   DOI
7 Whiteford, N., Skelly, T., Curtis, C., Ritchie, M. E., Löhr, A., Zaranek, A. W., Abnizova, I. and Brown, C. 2009. Swift: primary data analysis for the Illumina Solexa sequencing platform. Bioinformatics 25, 2194-2199.   DOI
8 Xie, W., Wang, F., Guo, L., Chen, Z., Sievert, S. M., Meng, J., Huang, G., Li, Y., Yan, Q. and Wu, S. et al. 2011. Comparative metagenomics of microbial communities inhabiting deep-sea hydrothermal vent chimneys with contrasting chemistries. ISME J. 5, 414-426.   DOI
9 Marth, G. T., Korf, I., Yandell, M. D., Yeh, R. T., Gu, Z., Zakeri, H., Stitziel, N. O., Hillier, L., Kwok, P. Y. and Gish W. R. 1999. A general approach to single-nucleotide polymorphism discovery. Nat. Genet. 23, 452-456.   DOI
10 McKenna, A., Hanna, M., Banks, E., Sivachenko, A. and Cibulskis, K., et al. 2010. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research 20, 1297-1303.   DOI
11 Milne, I., Bayer, M., Cardle, L., Shaw, P., Stephen, G., Wright, F. and Marshall, D. 2010. Tablet-next generation sequence assembly visualization. Bioinformatics 3, 401-402.
12 Mitra, R. D. and Church, G. M. 1999. In situ localized amplification and contact replication of many individual DNA molecules. Nucleic Acids Res. 27, e34.   DOI
13 Olshen, A. B., Venkatraman, E. S., Lucito, R. and Wigler, M. 2005. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5, 557-572.
14 Nagalakshmi, U., Wang, Z., Waern, K., Shou, C., Raha, D., Gerstein, M. and Snyder, M. 2008. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320, 1344-1349.   DOI
15 Ning, Z., Cox, A. J. and Mullikin, J. C. 2001. SSAHA: a fast search method for large DNA databases. Genome Res. 11, 1725-1729.   DOI
16 Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G. and Durbin, R. et al. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 16, 2078-2079.
17 Nothnagel, M., Herrmann, A., Wolf, A., Schreiber, S., Platzer, M., Siebert, R., Krawczak, M. and Hampe, J. 2011. Technology-specific error signatures in the 1000 Genomes Project data. Human Genome 130, 505-516. doi:10.1007/s00439-011-0971-3.   DOI
18 Pareek, C. S., Smoczynski, R. and Tretyn, A. 2011. Sequencing technologies and genome sequencing. J. Appl. Genetics 52, 413-435.   DOI
19 Li, H. and Durbin, R. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 754-1760. doi:10.1093/bioinformatics/btp324.   DOI
20 Li, R., Li, Y., Kristiansen, K. and Wang, J. 2008. SOAP: short oligonucleotide alignment program. Bioinformatics 5, 713- 714.
21 Li, H., Ruan, J. and Durbin, R. 2008. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 11, 1851-1858.
22 Li, R., Yu, C., Li, Y., Lam, T., Yiu, S., Kristiansen, K. and Wang, J. 2009. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 15, 1966-1967.
23 Malhis, N. and Jones, S. J. M. 2010. High quality SNP calling using Illumina data at shallow coverage. Bioinformatics 26, 1029-1035.   DOI
24 Lorenzi, H. A., Hoover, J., Inman, J., Safford, T., Murphy, S., Kagan, L. and Williamson, S. J. 2011. The Viral Meta- Genome Annotation Pipeline (VMGAP):an automated tool for the functional annotation of viral Metagenomic shotgun sequencing data. Stand. Genomic Sci. 4, 418-429.   DOI
25 Huang, W. and Marth, G. 2008. EagleView: a genome assembly viewer for next-generation sequencing technologies. Genome Res. 9, 1538-1543.
26 Magi, A., Benlli, M., Gozzini, A., Girolami, F., Torricelli, F. and Brandi, M. L. 2010. Bioinformatics for next generation sequencing data. Genes 1, 294-307.   DOI
27 Magi, A., Benelli, M., Seungtai Yoon, S. and Torricelli, F. Detecting common copy number variants in high-throughput sequencing data by using Joint SLM algorithm. Nucleic Acids Res., submitted for publication.
28 Hoberman, R., Dias, J., Ge, B., Harmsen, E., Mayhew, M., Verlaan, D. J., Kwan, T., Dewar, K., Blanchette, M. and Pastinen, T. 2009. A probabilistic approach for SNP discovery in high-throughput human resequencing data. Genome Res. 19, 1542-1552.   DOI
29 Hyman, E. D. 1988. A new method of sequencing DNA. Anal. Biochem. 174, 423-436.   DOI
30 Jimenez-Lopex, J. C., Gachomo, E. W., Sharma, S. and Kotchoni, S. O. 2013. Genome sequencing and next-generation sequence data analysis: a comprehensive compilation of bioinformatics tools and databases. Am. J. Mol. Biol. 3, 115-130.   DOI
31 Kent, W. J. 2002. BLAT-the BLAST-like alignment tool. Genome Res. 4, 656-664.
32 Lassmann, T., Hayashizaki, Y. and Daub, C. O. 2011. SAMStat: Monitoring biases in next generation sequencing data. Bioinformatics 27, 130-131. doi:10.1093/bioinformatics/btq614.   DOI
33 Kosakovsky, P. S., Wadhawan, S., Chiaromonte, F., Ananda, G., Chung, W. Y., Taylor, J. and Nekrutenko, A. 2009. Windshield splatter analysis with the Galaxy metagenomic pipeline. Genome Res. 19, 2144-2153.   DOI
34 Dalloul, R. A., Long, J. A., Zimin, A. V., Aslam, L. and Beal, K. et al. 2010. Multi-platform next generation sequencing of the domestic turkey (Meleagris gallopavo): Genome assembly and analysis. PLoS Biol. 8, e1000475. doi:10.1371/journal.pbio.1000475.   DOI
35 Krawitz, P., Rödelsperger, C., Jäger, M., Jostins, L., Bauer, S. and Robinson, P. N. 2010. Microindel detection in short-read sequence data. Bioinformatics 26, 722-729. doi: 10.1093/bioinformatics/btq027.   DOI
36 Langmead, B., Trapnell, C., Pop, M. and Salzberg, S. L. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 3, R25.
37 Li, H. and Durbin, R. 2010. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 5, 589-595.
38 Dinsdale, E. A., Edwards, R. A., Hall, D., Angly, F., Breitbart, M., Brulc, J. M., Furlan, M., Desnues, C., Haynes, M. and Li, L. et al. 2008. Functional metagenomic profiling of nine biomes. Nature 452, 629-632.   DOI
39 Durbin, R. M., Abecasis, G. R., Altshuler, D. L., Auton, A. and Brooks, L. D. et al. 2010. A map of human genome variation from population-scale sequencing. Nature 467, 1061-1073.   DOI
40 Fedurco, M., Romieu, A., Williams, S., Lawrence, I. and Turcatti, G. 2006. BTA, a novel reagent for DNA attachment on glass and efficient generation of solid-phase amplified DNA colonies. Nucleic Acids Res. 34, e22.   DOI
41 Feuk, L., Carson, A. R. and Scherer, S. W. 2006. Structural variation in the human genome. Nature Rev. Genet. 7, 85-97.
42 Flicek, P. and Birney, E. 2009. Sense from sequence reads: methods for alignment and assembly. Nat. Methods 6, S6-S12.   DOI
43 Adessi, C., Matton, G., Ayala, G., Turcatti, G., Mermod, J. J., Mayer, P. and Kawashima, E. 2000. Solid phase DNA amplification: characterisation of primer attachment and amplification mechanisms. Nucleic Acids Res. 28, e87.   DOI
44 Giardine, B., Riemer, C., Hardison, R. C., Burhans, R., Elnitski, L., Shah, P., Zhang, Y., Blankenberg, D., Albert, I. and Taylor, J. et al. 2005. Galaxy: a platform for interactive large-scale genome analysis. Genome Res. 15, 1451-1455.   DOI   ScienceOn
45 Gogol-Döring, A. and Chen, W. 2012. An overview of the analysis of next generation sequencing data. Methods Mol. Biol. 802, 249-57.   DOI
46 Grada, A. and Weinbrecht, K. 2013. Next-generation sequencing: methodology and appliction. J. Investig. Dermatol. 133, e11; doi:10.1038/jid.2013.248.   DOI
47 Alkan, C., Coe, B. P. and Eichler, E. E. 2011. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363-376.   DOI   ScienceOn
48 Bao, H., Guo, H., Wang, J., Zhou, R., Lu, X. and Shi, S. 2009. MapView: visualization of short reads alignment on a desktop computer. Bioinformatics 12, 1554-1555.
49 Campbell, P. J., Stephens, P. J., Pleasance, E. D., O'Meara, S., Li, H., Santarius, T., Stebbings, L. A., Leroy, C. and Edkins, S. et al. 2008. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat. Genet. 40, 722-729.   DOI   ScienceOn
50 Chiang, D. Y., Getz, G., Jaffe, D. B., O'Kelly, M. J. T., Zhao, X., Carter, S. L., Russ, C., Nusbaum, C., Meyerson, M. and Lander, E. S. 2009. High-resolution mapping of copy-number alterations with massively parallel sequencing. Nat. Methods 6, 99-103.   DOI
51 Dalca, A. V. and Brudno, M. 2010. Genome variation discovery with high-throughput sequencing data. Brief. Bioinform. 11, 3-14.   DOI