DOI QR코드

DOI QR Code

Evaluation of Alignment Methods for Genomic Analysis in HPC Environment

HPC 환경의 대용량 유전체 분석을 위한 염기서열정렬 성능평가

  • 임명은 (한국전자통신연구원 바이오의료IT융합연구부) ;
  • 정호열 (한국전자통신연구원 바이오의료IT융합연구부) ;
  • 김민호 (한국전자통신연구원 바이오의료IT융합연구부) ;
  • 최재훈 (한국전자통신연구원 바이오의료IT융합연구부) ;
  • 박수준 (한국전자통신연구원 바이오의료IT융합연구부) ;
  • 최완 (한국전자통신연구원 클라우드컴퓨팅연구부) ;
  • 이규철 (충남대학교 컴퓨터공학과)
  • Received : 2013.01.08
  • Accepted : 2013.01.24
  • Published : 2013.02.28

Abstract

With the progress of NGS technologies, large genome data have been exploded recently. To analyze such data effectively, the assistance of HPC technique is necessary. In this paper, we organized a genome analysis pipeline to call SNP from NGS data. To organize the pipeline efficiently under HPC environment, we analyzed the CPU utilization pattern of each pipeline steps. We found that sequence alignment is computing centric and suitable for parallelization. We also analyzed the performance of parallel open source alignment tools and found that alignment method utilizing many-core processor can improve the performance of genome analysis pipeline.

인간 유전체 지도 완성 후 NGS 기술의 발달로 대용량 유전체 데이터 분석에 대한 요구가 증대하였다. NGS 데이터는 대용량의 단편서열로 구성되므로 효과적인 분석을 위해 고성능 컴퓨팅 기술의 지원이 요구된다. 본 연구에서는 HPC 환경에서 NGS 데이터로부터 SNP를 탐색하는 유전체 분석 파이프라인을 구축하였다. 각 분석 단계의 CPU 이용률 분석을 통해 분석 단계 중 서열 정렬 단계가 연산 작업의 비율이 가장 높은 것을 확인하고, 공개된 병렬화 서열 정렬 도구들의 성능을 분석하여 유전체 분석를 위한 매니코어 프로세서의 활용 가능성을 확인하였다.

Keywords

References

  1. Human Genome Project, http://www.ornl.gov/sci/tech resources/Human_Genome/home.shtml
  2. C. Angermuller, A. Biegert, J. Soding, "Discriminative modelling of context-specific amino acid substitution probabilities," Bioinformatics, Vol.28, pp.3240-3247, 2012. https://doi.org/10.1093/bioinformatics/bts622
  3. P. Klus, S. Lam, D. Lyberg, MS Cheung, G. Pullan, I. McFarlane, GSH Yeo, BY Lam, "BarraCUDA - a fast short read sequence aligner using graphics processing units," BMC Research Notes, Vol.5, 27, 2012. https://doi.org/10.1186/1756-0500-5-27
  4. C. Liu, T. Wong, E. Wul, R. Luo, S. Yiu, Y. Li, B. Wang, C. Yu, X. Chu, K. Zhao, R. Li, T. Lam,, "SOAP3: Ultra-fast GPU-based parallel alignment tool for short reads," Bioinformatics, Vol.28, pp.878-879, 2012. https://doi.org/10.1093/bioinformatics/bts061
  5. A. Goetz, M. Williamson, D. Xu, D. Poole, S. Grand, R. Walker, "Routine microsecond molecular dynamics simulations with AMBER - Part I: Generalized Born," Chemistry Theory Computrmatics Journal, Vol.8, pp.1542-1555, 2012. https://doi.org/10.1021/ct200909j
  6. B. Needleman, D. Wunsch, "A general method applicable to the search for similarities in the amino acid sequence of two proteins," Journal of Molecular Biology, Vol.48, pp.443-453, 1970. https://doi.org/10.1016/0022-2836(70)90057-4
  7. F. Smith, S. Waterman, "Identification of Common Molecular Subsequences," Journal of Molecular Biology, Vol.147, pp.195-197, 1981. https://doi.org/10.1016/0022-2836(81)90087-5
  8. S. Altschul, W. Gish, W. Miller, E. Myers, D. Lipman, "Basic local alignment search tool," Journal of Molecular Biology, Vol.215, pp.403-410, 1990. https://doi.org/10.1016/S0022-2836(05)80360-2
  9. H. Li, J. Ruan, R. Durbin, "Mapping short DNA sequencing reads and calling variants using mapping quality scores," Genome Researchl, Vol.18, pp.1851-1858, 2008. https://doi.org/10.1101/gr.078212.108
  10. N. Homer, B. Merriman, SF. Nelson, "BFAST: an alignment tool for large scale genome resequencing," PLoS One, Vol.4, e7767, 2009. https://doi.org/10.1371/journal.pone.0007767
  11. R. Li, Y. Li Y, K. Kristiansen, J. Wang, "SOAP: short oligonucleotide alignment program," Bioinformatics, Vol.24, pp.713-714, 2008. https://doi.org/10.1093/bioinformatics/btn025
  12. Li H. and Durbin R, "Fast and accurate short read alignment with Burrows-Wheeler Transform," Bioinformatics, Vol.25, pp.1754-1760, 2009. https://doi.org/10.1093/bioinformatics/btp324
  13. B. Langmead , C. Trapnell, M. Pop, SL. Salzberg, "Ultrafast and memory-efficient alignment of short DNA sequences to the human genome," Genome Biology, Vol.10, R25, 2009. https://doi.org/10.1186/gb-2009-10-3-r25
  14. R. Li, C. Yu, Y. Li, TW. Lam, SM. Yiu, K. Kristiansen, J. Wang, "SOAP2: an improved ultrafast tool for short read alignment," Bioinformatics, Vol.25, pp.1966-1967, 2009. https://doi.org/10.1093/bioinformatics/btp336
  15. M. Farrar, "Striped Smith-Waterman speeds database searches six times over other SIMD implementations," Bioinformatics, Vol.23, pp.156-161, 2007. https://doi.org/10.1093/bioinformatics/btl582
  16. B. Langmead, S. Salzberg, "Fast gapped-read alignment with Bowtie 2," Nature Methods, Vol.9, pp357-359, 2012. https://doi.org/10.1038/nmeth.1923
  17. H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, "The Sequence Alignment/Map format and SAMtools," Bioinformatics Journal, Vol.25, pp.2078-2079, 2009. https://doi.org/10.1093/bioinformatics/btp352