Browse > Article
http://dx.doi.org/10.3745/JIPS.2012.8.4.555

An Adaptive Workflow Scheduling Scheme Based on an Estimated Data Processing Rate for Next Generation Sequencing in Cloud Computing  

Kim, Byungsang (Dept. of Information and Communications Engineering, KAIST)
Youn, Chan-Hyun (Dept. of Electrical Engineering, KAIST)
Park, Yong-Sung (Dept. of Electrical Engineering, KAIST)
Lee, Yonggyu (Dept. of Electrical Engineering, KAIST)
Choi, Wan (Electronics and Telecommunications Research Institute)
Publication Information
Journal of Information Processing Systems / v.8, no.4, 2012 , pp. 555-566 More about this Journal
Abstract
The cloud environment makes it possible to analyze large data sets in a scalable computing infrastructure. In the bioinformatics field, the applications are composed of the complex workflow tasks, which require huge data storage as well as a computing-intensive parallel workload. Many approaches have been introduced in distributed solutions. However, they focus on static resource provisioning with a batch-processing scheme in a local computing farm and data storage. In the case of a large-scale workflow system, it is inevitable and valuable to outsource the entire or a part of their tasks to public clouds for reducing resource costs. The problems, however, occurred at the transfer time for huge dataset as well as there being an unbalanced completion time of different problem sizes. In this paper, we propose an adaptive resource-provisioning scheme that includes run-time data distribution and collection services for hiding the data transfer time. The proposed adaptive resource-provisioning scheme optimizes the allocation ratio of computing elements to the different datasets in order to minimize the total makespan under resource constraints. We conducted the experiments with a well-known sequence alignment algorithm and the results showed that the proposed scheme is efficient for the cloud environment.
Keywords
Resource-Provisioning; Bio-Workflow Broker; Next-Generation Sequencing;
Citations & Related Records
연도 인용수 순위
  • Reference
1 DNA Sequencing, http://en.wikipedia.org/wiki/DNA\_sequencing
2 S. Deng, A Study on Policy Adjuster integrated Grid Workflow Management System, MS Thesis, Information and Communications University, Korea, 2008.
3 C. H Han, C. H Youn, W. Jung, Web-Based System for Advanced Heart Disease Identification Using Grid Computing Technology, 21st IEEE International Symposium on Computer-Based Medical Systems, 2008.
4 C. H Youn, B. Kim, and E. B Shim, Resource Reconfiguration Scheme Based on Temporal Quorum Status Estimation for Grid Management, IEICE Trans. Comm. E88 (11) (2005) 4378-4381.
5 C. H. Youn, E. B. Shim, et al, A Cooperative Metabolic Syndrome Estimation with High Precision Sensing Unit, IEEE Transaction on Biomedical Engineering, Vol. 58, No. 3, pp809-813, March 2011.   DOI   ScienceOn
6 G. O. Young, Synthetic Structure of Industrial Plastics, in Plastics, 2nd ed. vol. 3, J. Peters, Ed. New York: McGraw-Hill, 1964, pp.1564.
7 D. Sulakhe, M. D'Souza, M. Syed, A. Rodriguez, Y. Zhang, E. Glass, M. Romine, and N. Maltsev, GNARE - A Grid-based Server for the Analysis of User Submitted Genomes, Nucleic Acids Res. vol. NAR-00335-Web-B-2007.Rl, 2007.
8 Maltsev N, Glass E, Sulakhe D, Rodriguez A, Syed MH, Bompada T, Zhang Y, D'Souza M. PUMA2 Grid-based High-throughput Analysis of Genomes and Metabolic Pathways. Nucleic Acids Res. Vol.34, 2006.
9 C. S. Schuster, Next-generation Sequencing Transforms Today's Biology, Nature Methods, Vol.5, 2008.
10 Apache Hadoop. http://hadoop.apache.org/
11 Li, H., Homer, N. A Survey of Sequence Alignment Algorithms for Next-Generation Sequencing, Briefings in Bioinformatics 11(5) September, 2010.
12 Li H. and Durbin R. Fast and Accurate Long-read Alignment with Buffows-Wheeler Transform, Bioinformatics, Epub, 2010.
13 Ahn. S. M, et.al, The first Korean Genome Sequence and Analysis: Full Genome Sequencing for a Socio-ethnic Group, Genome Research, 2009.
14 Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J. Basic Local Alignment Search Tool, Journal of Molecular Biology 215(3) October 1990.
15 Kepler Project. https://kepler-project.org/
16 Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, The Sequence Alignment/Map format and SAMtools, 1000 Genome project data processing subjroup, 2009.