• Title/Summary/Keyword: GATK

Search Result 5, Processing Time 0.023 seconds

Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms

  • Franke, Karl R.;Crowgey, Erin L.
    • Genomics & Informatics
    • /
    • v.18 no.1
    • /
    • pp.10.1-10.9
    • /
    • 2020
  • Advancements in next generation sequencing (NGS) technologies have significantly increased the translational use of genomics data in the medical field as well as the demand for computational infrastructure capable processing that data. To enhance the current understanding of software and hardware used to compute large scale human genomic datasets (NGS), the performance and accuracy of optimized versions of GATK algorithms, including Parabricks and Sentieon, were compared to the results of the original application (GATK V4.1.0, Intel x86 CPUs). Parabricks was able to process a 50× whole-genome sequencing library in under 3 h and Sentieon finished in under 8 h, whereas GATK v4.1.0 needed nearly 24 h. These results were achieved while maintaining greater than 99% accuracy and precision compared to stock GATK. Sentieon's somatic pipeline achieved similar results greater than 99%. Additionally, the IBM POWER9 CPU performed well on bioinformatic workloads when tested with 10 different tools for alignment/mapping.

NGSOne: Cloud-based NGS data analysis tool (NGSOne: 클라우드 기반의 유전체(NGS) 데이터 분석 툴)

  • Kwon, Chang-hyuk;Kim, Jason;Jang, Jeong-hwa;Ahn, Jae-gyoon
    • Journal of Platform Technology
    • /
    • v.6 no.4
    • /
    • pp.87-95
    • /
    • 2018
  • With the decrease of sequencing price, many national projects that analyzes 0.1 to 1 million people are now in progress. However, large portion of budget of these large projects is dedicated for construction of the cluster system or purchase servers, due to the lack of programs or systems that can handle large amounts of data simultaneously. In this study, we developed NGSOne, a client program that is easy-to-use for even biologists, and performs SNP analysis using hundreds or more of Whole Genome and Whole Exome analysis without construction of their own server or cluster environment. DRAGEN, BWA / GATK, and Isaac / Strelka2, which are representative SNP analysis tools, were selected and DRAGEN showed the best performance in terms of execution time and number of errors. Also, NGSOne can be extended for various analysis tools as well as SNP analysis tools.

Efficient Determination of Genomic Variants from Sorghum Genetic Resources by HPC

  • Tae-Ho Lee;Myung-Eun Park;Yun-Ho Oh;Da-Hye Jeon
    • Proceedings of the Korean Society of Crop Science Conference
    • /
    • 2022.10a
    • /
    • pp.241-241
    • /
    • 2022
  • In the digital age, a lot of agricultural R&D is based on data. However, genetic resources are still essential for basic research and agricultural development. Accordingly, many countries are making great efforts to secure various genetic resources. In Korea, the National Agrobiodiversity Center (NAC) has more than 270,000 plant genetic resources so far as part of its efforts. In order to efficiently use the resources for agricultural R&D, it is essential to determine the genotypes of the resources. For this, it is essential to build a system for mass genotyping. For this, sorghum were selected as a model crop considering the genome size, the high-quality reference genome, and the number of resources. To efficiently determine the genotype data from many genetic resources, we developed a GATK pipeline that works efficiently on HPC. The pipeline efficiently and rapidly determined 769 genotypes of 410 genetic resources. Going forward, our team will continue to work to determine genotypes of over a thousand sorghum resources, and the data will be released at the National Agricultural Biotechnology Information Center (NABIC) in order to be used in agricultural R&D.

  • PDF

Massive Parallel Sequencing for Diagnostic Genetic Testing of BRCA Genes - a Single Center Experience

  • Ermolenko, Natalya A;Boyarskikh, Uljana A;Kechin, Andrey A;Mazitova, Alexandra M;Khrapov, Evgeny A;Petrova, Valentina D;Lazarev, Alexandr F;Kushlinskii, Nikolay E;Filipenko, Maxim L
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.16 no.17
    • /
    • pp.7935-7941
    • /
    • 2015
  • The aim of this study was to implement massive parallel sequencing (MPS) technology in clinical genetics testing. We developed and tested an amplicon-based method for resequencing the BRCA1 and BRCA2 genes on an Illumina MiSeq to identify disease-causing mutations in patients with hereditary breast or ovarian cancer (HBOC). The coding regions of BRCA1 and BRCA2 were resequenced in 96 HBOC patient DNA samples obtained from different sample types: peripheral blood leukocytes, whole blood drops dried on paper, and buccal wash epithelia. A total of 16 random DNA samples were characterized using standard Sanger sequencing and applied to optimize the variant calling process and evaluate the accuracy of the MPS-method. The best bioinformatics workflow included the filtration of variants using GATK with the following cut-offs: variant frequency >14%, coverage ($>25{\times}$) and presence in both the forward and reverse reads. The MPS method had 100% sensitivity and 94.4% specificity. Similar accuracy levels were achieved for DNA obtained from the different sample types. The workflow presented herein requires low amounts of DNA samples (170 ng) and is cost-effective due to the elimination of DNA and PCR product normalization steps.

Identification of Causal and/or Rare Genetic Variants for Complex Traits by Targeted Resequencing in Population-based Cohorts

  • Kim, Yun-Kyoung;Hong, Chang-Bum;Cho, Yoon-Shin
    • Genomics & Informatics
    • /
    • v.8 no.3
    • /
    • pp.131-137
    • /
    • 2010
  • Genome-wide association studies (GWASs) have greatly contributed to the identification of common variants responsible for numerous complex traits. There are, however, unavoidable limitations in detecting causal and/or rare variants for traits in this approach, which depends on an LD-based tagging SNP microarray chip. In an effort to detect potential casual and/or rare variants for complex traits, such as type 2 diabetes (T2D) and triglycerides (TGs), we conducted a targeted resequencing of loci identified by the Korea Association REsource (KARE) GWAS. The target regions for resequencing comprised whole exons, exon-intron boundaries, and regulatory regions of genes that appeared within 1 Mb of the GWA signal boundary. From 124 individuals selected in population-based cohorts, a total of 0.7 Mb target regions were captured by the NimbleGen sequence capture 385K array. Subsequent sequencing, carried out by the Roche 454 Genome Sequencer FLX, generated about 110,000 sequence reads per individual. Mapping of sequence reads to the human reference genome was performed using the SSAHA2 program. An average of 62.2% of total reads was mapped to targets with an average 22X-fold coverage. A total of 5,983 SNPs (average 846 SNPs per individual) were called and annotated by GATK software, with 96.5% accuracy that was estimated by comparison with Affymetrix 5.0 genotyped data in identical individuals. About 51% of total SNPs were singletons that can be considered possible rare variants in the population. Among SNPs that appeared in exons, which occupies about 20% of total SNPs, 304 nonsynonymous singletons were tested with Polyphen to predict the protein damage caused by mutation. In total, we were able to detect 9 and 6 potentially functional rare SNPs for T2D and triglycerides, respectively, evoking a further step of replication genotyping in independent populations to prove their bona fide relevance to traits.