• 제목/요약/키워드: GATK

검색결과 5건 처리시간 0.019초

Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms

  • Franke, Karl R.;Crowgey, Erin L.
    • Genomics & Informatics
    • /
    • 제18권1호
    • /
    • pp.10.1-10.9
    • /
    • 2020
  • Advancements in next generation sequencing (NGS) technologies have significantly increased the translational use of genomics data in the medical field as well as the demand for computational infrastructure capable processing that data. To enhance the current understanding of software and hardware used to compute large scale human genomic datasets (NGS), the performance and accuracy of optimized versions of GATK algorithms, including Parabricks and Sentieon, were compared to the results of the original application (GATK V4.1.0, Intel x86 CPUs). Parabricks was able to process a 50× whole-genome sequencing library in under 3 h and Sentieon finished in under 8 h, whereas GATK v4.1.0 needed nearly 24 h. These results were achieved while maintaining greater than 99% accuracy and precision compared to stock GATK. Sentieon's somatic pipeline achieved similar results greater than 99%. Additionally, the IBM POWER9 CPU performed well on bioinformatic workloads when tested with 10 different tools for alignment/mapping.

NGSOne: 클라우드 기반의 유전체(NGS) 데이터 분석 툴 (NGSOne: Cloud-based NGS data analysis tool)

  • 권창혁;김원호;장정화;안재균
    • Journal of Platform Technology
    • /
    • 제6권4호
    • /
    • pp.87-95
    • /
    • 2018
  • 개인 전장 유전체 분석 가격의 하락으로 많은 국가들이 10만명에서 100만명까지의 대량 전장 유전체 분석과 엑솜 시퀀싱을 진행하고 있다. 하지만 많은 대형 프로젝트에서 대량의 데이터를 처리할 수 있는 프로그램이나 시스템의 부족으로 많은 비용이 클러스터 구축 및 시스템 구매 비용으로 소비되고 있다. 본 연구에서는 자체 서버나 클러스터 환경을 구축하지 않고도 동시에 수백 개 이상의 전장 유전체 및 엑솜에 대한 단일 염기 다형성(Single Nucleotide Polymorphism; SNP) 분석을 수행할 수 있고, 생물학자들도 쉽게 설치하여 운영할 수 있는 클라이언트 프로그램인 NGSOne을 개발하였다. 대표적인 SNP 분석 도구인 DRAGEN, BWA/GATK 및 Isaac/Strelka2를 선택하여 분석할 수 있고, 3개 툴에서 실행 시간 및 에러의 개수 면에서는 DRAGEN이 가장 좋은 성능을 보였다. 또한 NGSOne은 SNP 분석뿐만 아니라 다양한 분석 도구의 자동적인 실행을 위한 확장이 가능하다.

Efficient Determination of Genomic Variants from Sorghum Genetic Resources by HPC

  • Tae-Ho Lee;Myung-Eun Park;Yun-Ho Oh;Da-Hye Jeon
    • 한국작물학회:학술대회논문집
    • /
    • 한국작물학회 2022년도 추계학술대회
    • /
    • pp.241-241
    • /
    • 2022
  • In the digital age, a lot of agricultural R&D is based on data. However, genetic resources are still essential for basic research and agricultural development. Accordingly, many countries are making great efforts to secure various genetic resources. In Korea, the National Agrobiodiversity Center (NAC) has more than 270,000 plant genetic resources so far as part of its efforts. In order to efficiently use the resources for agricultural R&D, it is essential to determine the genotypes of the resources. For this, it is essential to build a system for mass genotyping. For this, sorghum were selected as a model crop considering the genome size, the high-quality reference genome, and the number of resources. To efficiently determine the genotype data from many genetic resources, we developed a GATK pipeline that works efficiently on HPC. The pipeline efficiently and rapidly determined 769 genotypes of 410 genetic resources. Going forward, our team will continue to work to determine genotypes of over a thousand sorghum resources, and the data will be released at the National Agricultural Biotechnology Information Center (NABIC) in order to be used in agricultural R&D.

  • PDF

Massive Parallel Sequencing for Diagnostic Genetic Testing of BRCA Genes - a Single Center Experience

  • Ermolenko, Natalya A;Boyarskikh, Uljana A;Kechin, Andrey A;Mazitova, Alexandra M;Khrapov, Evgeny A;Petrova, Valentina D;Lazarev, Alexandr F;Kushlinskii, Nikolay E;Filipenko, Maxim L
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제16권17호
    • /
    • pp.7935-7941
    • /
    • 2015
  • The aim of this study was to implement massive parallel sequencing (MPS) technology in clinical genetics testing. We developed and tested an amplicon-based method for resequencing the BRCA1 and BRCA2 genes on an Illumina MiSeq to identify disease-causing mutations in patients with hereditary breast or ovarian cancer (HBOC). The coding regions of BRCA1 and BRCA2 were resequenced in 96 HBOC patient DNA samples obtained from different sample types: peripheral blood leukocytes, whole blood drops dried on paper, and buccal wash epithelia. A total of 16 random DNA samples were characterized using standard Sanger sequencing and applied to optimize the variant calling process and evaluate the accuracy of the MPS-method. The best bioinformatics workflow included the filtration of variants using GATK with the following cut-offs: variant frequency >14%, coverage ($>25{\times}$) and presence in both the forward and reverse reads. The MPS method had 100% sensitivity and 94.4% specificity. Similar accuracy levels were achieved for DNA obtained from the different sample types. The workflow presented herein requires low amounts of DNA samples (170 ng) and is cost-effective due to the elimination of DNA and PCR product normalization steps.

Identification of Causal and/or Rare Genetic Variants for Complex Traits by Targeted Resequencing in Population-based Cohorts

  • Kim, Yun-Kyoung;Hong, Chang-Bum;Cho, Yoon-Shin
    • Genomics & Informatics
    • /
    • 제8권3호
    • /
    • pp.131-137
    • /
    • 2010
  • Genome-wide association studies (GWASs) have greatly contributed to the identification of common variants responsible for numerous complex traits. There are, however, unavoidable limitations in detecting causal and/or rare variants for traits in this approach, which depends on an LD-based tagging SNP microarray chip. In an effort to detect potential casual and/or rare variants for complex traits, such as type 2 diabetes (T2D) and triglycerides (TGs), we conducted a targeted resequencing of loci identified by the Korea Association REsource (KARE) GWAS. The target regions for resequencing comprised whole exons, exon-intron boundaries, and regulatory regions of genes that appeared within 1 Mb of the GWA signal boundary. From 124 individuals selected in population-based cohorts, a total of 0.7 Mb target regions were captured by the NimbleGen sequence capture 385K array. Subsequent sequencing, carried out by the Roche 454 Genome Sequencer FLX, generated about 110,000 sequence reads per individual. Mapping of sequence reads to the human reference genome was performed using the SSAHA2 program. An average of 62.2% of total reads was mapped to targets with an average 22X-fold coverage. A total of 5,983 SNPs (average 846 SNPs per individual) were called and annotated by GATK software, with 96.5% accuracy that was estimated by comparison with Affymetrix 5.0 genotyped data in identical individuals. About 51% of total SNPs were singletons that can be considered possible rare variants in the population. Among SNPs that appeared in exons, which occupies about 20% of total SNPs, 304 nonsynonymous singletons were tested with Polyphen to predict the protein damage caused by mutation. In total, we were able to detect 9 and 6 potentially functional rare SNPs for T2D and triglycerides, respectively, evoking a further step of replication genotyping in independent populations to prove their bona fide relevance to traits.