Browse > Article
http://dx.doi.org/10.3745/KTSDE.2013.2.2.097

Workflow-based Bio Data Analysis System for HPC  

Ahn, Shinyoung (한국전자통신연구원 클라우드컴퓨팅연구부)
Kim, ByoungSeob (한국전자통신연구원 클라우드컴퓨팅연구부)
Choi, Hyun-Hwa (한국전자통신연구원 클라우드컴퓨팅연구부)
Jeon, Seunghyub (한국전자통신연구원 클라우드컴퓨팅연구부)
Bae, Seungjo (한국전자통신연구원 클라우드컴퓨팅연구부)
Choi, Wan (한국전자통신연구원 클라우드컴퓨팅연구부)
Publication Information
KIPS Transactions on Software and Data Engineering / v.2, no.2, 2013 , pp. 97-106 More about this Journal
Abstract
Since human genome project finished, the cost for human genome analysis has decreased very rapidly. This results in the sharp increase of human genome data to be analyzed. As the need for fast analysis of very large bio data such as human genome increases, non IT researchers such as biologists should be able to execute fast and effectively many kinds of bio applications, which have a variety of characteristics, under HPC environment. To accomplish this purpose, a biologist need to define a sequence of bio applications as workflow easily because generally bio applications should be combined and executed in some order. This bio workflow should be executed in the form of distributed and parallel computing by allocating computing resources efficiently under HPC cluster system. Through this kind of job, we can expect better performance and fast response time of very large bio data analysis. This paper proposes a workflow-based data analysis system specialized for bio applications. Using this system, non-IT scientists and researchers can analyze very large bio data easily under HPC environment.
Keywords
HPC; Genome Analysis; Bio-Informatics; WMS; RMS; Supercomputer; MAHA Supercomputer;
Citations & Related Records
연도 인용수 순위
  • Reference
1 E. Bartocci et al., "BioWMS: a web-baed Workflow Management System for bioinformatics", BMC Bioinformatics, Vol.8(Suppl I), S2, 2007.
2 T. Carver and A. Bleasby, "The design of Jemboss: a graphical user interface to EMBOSS", Bioinformatics, Vol.19, No.14, pp.1837-1842, 2003.   DOI   ScienceOn
3 C. C. Lian et al., "GEL: Grid Execution Language", Parallel and Distributed Computing, Vol.65, No.7, pp.857-869, 2005.   DOI   ScienceOn
4 GPGPU, Wikipedia, http://en.wikipedia.org/wiki/GPGPU
5 Intel MIC, Wikipedia, http://en.wikipedia.org/wiki/Intel_MIC
6 Burrows-Wheeler Aligner, http://bio-bwa.sourceforge.net/
7 SAMtools, http://samtools.sourceforge.net/
8 J. C. Venter et al., "The Sequence of the Human Genome", Science, Vol.291 No.5507, pp.1304-1351.   DOI   ScienceOn
9 NHGRI Genome Sequencing Program, http://www. genome.gov/sequencingcosts/
10 Human Genome Project, Wikipedia, http://en.wikipedia.org/wiki/Human_Genome_Project
11 Biology 2.0, Special report, The Economist, 2010, http://www. economist.com/node/16349358
12 Yunku Yeu et al., "A survey of sequence alignment algorithms for next-generation sequencing read", KIISE Database Society Journal, Vol.28 No.1 pp.33-51, 2012.
13 Simple Linux Utility for Resource Management (SLURM), https://computing.llnl.gov/linux/slurm/
14 Oracle Grid Engine(Sun Grid Engine), http://www.oracle.com/technetwork/oem/grid-engine-166852.html
15 TORQUE Resource Manager, http://www.adaptivecomputing. com/products/open-source/torque/
16 P. Missie et al., "Taverna reloaded", In Proc. of SSDBM, 2010
17 I. Altintas et al., "Kepler: An Extensible System for Design and Execution of Scientific Workflows", In Proc. of SSDBM, pp.423-424, 2004.
18 S. Majithia et al., "Triana: A Graphical Web Service Composition and Execution Toolkit", In Proc. of ICWS, pp. 514-421, 2004.
19 S. Hoon et al., "Biopipe: A Flexible Framework for Protocol-Based Bioinformatics Analysis", Genome Research, Vol.13, No.8, pp.1904-1915, 2003.
20 F. Tang et al., "Widlfire: distributed, Grid-enabled construction and execution", BMC Bioinformatics, Vol.6, pp.69, 2005.   DOI   ScienceOn
21 S. P. Shan et al., "Pegasys: software for executing and integrating analyses of biological sequences", BMC Bioinformatics, Vol.5, pp.40, 2004.   DOI   ScienceOn
22 J. Orivs et al., "Ergatis: a web interface and scalable software system for bioinformatics workflows", Bioinformatics, Vol.26, No.12, pp.1488-1492, 2010.   DOI   ScienceOn