NGSOne: Cloud-based NGS data analysis tool

NGSOne: 클라우드 기반의 유전체(NGS) 데이터 분석 툴

  • Received : 2018.11.04
  • Accepted : 2018.12.26
  • Published : 2018.12.31

Abstract

With the decrease of sequencing price, many national projects that analyzes 0.1 to 1 million people are now in progress. However, large portion of budget of these large projects is dedicated for construction of the cluster system or purchase servers, due to the lack of programs or systems that can handle large amounts of data simultaneously. In this study, we developed NGSOne, a client program that is easy-to-use for even biologists, and performs SNP analysis using hundreds or more of Whole Genome and Whole Exome analysis without construction of their own server or cluster environment. DRAGEN, BWA / GATK, and Isaac / Strelka2, which are representative SNP analysis tools, were selected and DRAGEN showed the best performance in terms of execution time and number of errors. Also, NGSOne can be extended for various analysis tools as well as SNP analysis tools.

개인 전장 유전체 분석 가격의 하락으로 많은 국가들이 10만명에서 100만명까지의 대량 전장 유전체 분석과 엑솜 시퀀싱을 진행하고 있다. 하지만 많은 대형 프로젝트에서 대량의 데이터를 처리할 수 있는 프로그램이나 시스템의 부족으로 많은 비용이 클러스터 구축 및 시스템 구매 비용으로 소비되고 있다. 본 연구에서는 자체 서버나 클러스터 환경을 구축하지 않고도 동시에 수백 개 이상의 전장 유전체 및 엑솜에 대한 단일 염기 다형성(Single Nucleotide Polymorphism; SNP) 분석을 수행할 수 있고, 생물학자들도 쉽게 설치하여 운영할 수 있는 클라이언트 프로그램인 NGSOne을 개발하였다. 대표적인 SNP 분석 도구인 DRAGEN, BWA/GATK 및 Isaac/Strelka2를 선택하여 분석할 수 있고, 3개 툴에서 실행 시간 및 에러의 개수 면에서는 DRAGEN이 가장 좋은 성능을 보였다. 또한 NGSOne은 SNP 분석뿐만 아니라 다양한 분석 도구의 자동적인 실행을 위한 확장이 가능하다.

Keywords

References

  1. The $1,000 Genome, https://www.illumina.com/company/news-center/feature-articles/the-1000-dollar-genome.html
  2. November J, "More than Moore's Mores: Computers, Genomics, and the Embrace of Innovation," J Hist Biol., Aug. 2018.
  3. Neil A. Miller, Emily G. Farrow, Margaret Gibson, Laurel K. Willig, Greyson Twist, Byunggil Yoo, Tyler Marrs, Shane Corder, Lisa Krivohlavek, Adam Walter, Josh E. Petrikin, Carol J. Saunders, Isabelle Thiffault, Sarah E. Soden, Laurie D. Smith, Darrell L. Dinwiddie, Suzanne Herd, Julie A. Cakici, Severine Catreux, Mike Ruehle and Stephen F. Kingsmore, "A 26-hour system of highly sensitive whole genome sequencing for emergency management of genetic diseases," Genome Medicine, 7:100, Sep. 2015.
  4. Kim S, Scheffler K, Halpern AL, Bekritsky MA, Noh E, Kallberg M, Chen X, Kim Y, Beyter D, Krusche P, Saunders CT, "Strelka2: fast and accurate calling of germline and somatic variants," Nat Methods., 15(8):591-594, Aug. 2018.
  5. Amit Goyal, Hyuk Jung Kwon, Kichan Lee, Reena Garg, Seon Young Yun, Yoon Hee Kim, Sunghoon Lee, Min Seob Lee, "Ultra-Fast Next Generation Human Genome Sequencing Data Processing Using $DRAGEN^{TM}$ Bio-IT Processor for Precision Medicine," Open Journal of Genetics, Vol.07 No.01, Mar. 2017.
  6. Raczy C, Petrovski R, Saunders CT, Chorny I, Kruglyak S, Margulies EH, Chuang HY, Kallberg M, Kumar SA, Liao A, Little KM, Stromberg MP, Tanner SW., "Isaac: ultra-fast whole-genome secondary analysis on Illumina sequencing platforms," Bioinformatics, 15;29(16):2041-3, Aug. 2013.
  7. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA, "The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data," Genome Res., 20:1297-303, Sep. 2010.
  8. Garrison, E. & Marth, G., "Haplotype-based variant detection from short-read sequencing," arXiv preprint, ArXiv:1207.3907 [q-bio.GN], Jul. 2012.
  9. Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, Sougnez C, Gabriel S, Meyerson M, Lander ES, Getz G., "Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples," Nat Biotechnol., 31(3):213-9, Mar. 2013.
  10. Wilm A, Aw PP, Bertrand D, Yeo GH, Ong SH, Wong CH, Khor CC, Petric R, Hibberd ML, Nagarajan N., "LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets," Nucleic Acids Res., 40(22):11189-201, Dec. 2012.
  11. Luo R, Schatz MC, Salzberg SL, "16GT: a fast and sensitive variant caller using a 16-genotype probabilistic model," Gigascience., 1;6(7):1-4, Jul. 2017.
  12. Field MA, Cho V, Andrews TD, Goodnow CC, "Reliably Detecting Clinically Important Variants Requires Both Combined Variant Calls and Optimized Filtering Strategies," PLoS One., 23;10(11):e0143199, Nov. 2015.
  13. Poplin R, Chang PC, Alexander D, Schwartz S, Colthurst T, Ku A, Newburger D, Dijamco J, Nguyen N, Afshar PT, Gross SS, Dorfman L, McLean CY, DePristo MA, "A universal SNP and small-indel variant caller using deep neural networks," Nat Biotechnol., 24. doi: 10.1038/nbt.4235, Sep. 2018.
  14. DRAGEN, http://edicogenome.com/dragen-bioit-platform/
  15. Konrad J. Karczewski, Guy Haskin Fernald, Alicia R. Martin, Michael Snyder, Nicholas P. Tatonetti, Joel T. Dudley, "STORMSeq: An Open-Source, User-Friendly Pipeline for Processing Personal Genomics Data in the Cloud," PLoS One., 2014; 9(1): e84860., Jan. 2014.
  16. Yassine Souilmi, Alex K. Lancaster, Jae-Yoon Jung, Ettore Rizzo, Jared B. Hawkins, Ryan Powles, Saaid Amzazi, Hassan Ghazal, Peter J. Tonellato, Dennis P. Wall, "Scalable and cost-effective NGS genotyping in the cloud," BMC Med Genomics., 2015; 8: 64., Oct. 2015.
  17. Krithika Bhuvaneshwar, Dinanath Sulakhe, Robinder Gauba, Alex Rodriguez, Ravi Madduri, Utpal Dave, Lukasz Lacinski, Ian Foster, Yuriy Gusev, Subha Madhavan, "A case study for cloud based high throughput analysis of NGS data using the globus genomics system," Comput Struct Biotechnol J., 2015; 13: 64-74., Nov. 2014.
  18. Erik Gafni, Lovelace J. Luquette, Alex K. Lancaster, Jared B. Hawkins, Jae-Yoon Jung, Yassine Souilmi, Dennis P. Wall, Peter J. Tonellato, "COSMOS: Python library for massively parallel workflows," Bioinformatics., 2014 Oct 15; 30(20): 2956-2958., Jun. 2014.
  19. Wang Y, Li G, Ma M, He F, Song Z, Zhang W, Wu C., "GT-WGS: an efficient and economic tool for large-scale WGS analyses based on the AWS cloud service," BMC Genomics., 2018 Jan 19;19(Suppl 1):959., Jan. 2018.
  20. Sequencing.com, https://sequencing.com/
  21. MyGenomeBox, https://www.mygenomebox.com/