Browse > Article
http://dx.doi.org/10.5392/IJoC.2020.16.4.016

A Workflow Execution System for Analyzing Large-scale Astronomy Data on Virtualized Computing Environments  

Yu, Jung-Lok (Korea Institute of Science and Technology Information (KISTI))
Jin, Du-Seok (Korea Institute of Science and Technology Information (KISTI))
Yeo, Il-Yeon (Korea Institute of Science and Technology Information (KISTI))
Yoon, Hee-Jun (Korea Institute of Science and Technology Information (KISTI))
Publication Information
Abstract
The size of observation data in astronomy has been increasing exponentially with the advents of wide-field optical telescopes. This means the needs of changes to the way used for large-scale astronomy data analysis. The complexity of analysis tools and the lack of extensibility of computing environments, however, lead to the difficulty and inefficiency of dealing with the huge observation data. To address this problem, this paper proposes a workflow execution system for analyzing large-scale astronomy data efficiently. The proposed system is composed of two parts: 1) a workflow execution manager and its RESTful endpoints that can automate and control data analysis tasks based on workflow templates and 2) an elastic resource manager as an underlying mechanism that can dynamically add/remove virtualized computing resources (i.e., virtual machines) according to the analysis requests. To realize our workflow execution system, we implement it on a testbed using OpenStack IaaS (Infrastructure as a Service) toolkit and HTCondor workload manager. We also exhaustively perform a broad range of experiments with different resource allocation patterns, system loads, etc. to show the effectiveness of the proposed system. The results show that the resource allocation mechanism works properly according to the number of queued and running tasks, resulting in improving resource utilization, and the workflow execution manager can handle more than 1,000 concurrent requests within a second with reasonable average response times. We finally describe a case study of data reduction system as an example application of our workflow execution system.
Keywords
workflow; astronomy; data analysis; dynamic resource allocation; cloud; virtualization;
Citations & Related Records
Times Cited By KSCI : 2  (Citation Analysis)
연도 인용수 순위
1 P. Ivanovic and H. Richter, "OpenStack cloud tuning for high performance computing," 2018 IEEE 3rd International Conference on Cloud Computing and Big Data Analysis (ICCCBDA), Chengdu, pp. 142-146, 2018, doi: https://doi.org/10.1109/ICCCBDA.2018.8386502.   DOI
2 R. Raman, M. Livny, and M. Solomon, "Matchmaking: distributed resource management for high throughput computing," Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244), Chicago, IL, USA, pp. 140-146, 1998, doi: https://doi.org/10.1109/HPDC.1998.709966.   DOI
3 P. Couvares, T. Kosar, A. Roy, J. Weber, and K. Wenger, "Workflow Management in Condor," In: I. J. Taylor, E. Deelman, D. B. Gannon, and M. Shields (eds), Workflows for e-Science, Springer, London, 2007, doi: https://doi.org/10.1007/978-1-84628-757-2_22.   DOI
4 G. B. Berriman, E. Deelman, J. Good, G. Juve, J. Kinney, A. Merrihew, and M. Rynge, Creating A Galactic Plane Atlas With Amazon Web Services, 2013, [Online] Available: http://arxiv.org/abs/1312.6723
5 R. E. Deelman, K. Vahi, G. Juve, M. Rynge, S. Callaghan, P. J. Maechling, R. Mayani, W. Chen, R. F. da Silva, M. Livny, and K. Wenger, "Pegasus, a workflow management system for science automation," Future Generation Computer Systems, vol. 46, pp. 17-35, 2015, doi: https://doi.org/10.1016/j.future.2014.10.008.   DOI
6 M. Johnson, G. Daues, and H. F. Chiang, "HTCondor in Astronomy at NCSA," HTCondor Week 2019 [Online] Available: https://agenda.hep.wisc.edu/event/1325/session/10/contribution/51/material/slides/0.pdf
7 M. Ramm, M. Bayer, and B. Rhodes, SQLAlchemy: database access using Python, Addison-Wesley, Boston, MA, 2009.
8 X. Wang, H. Schulzrinne, D. Kandlur, and D. Verma, "Measurement and Analysis of LDAP Performance," IEEE/ACM Transactions on Networking, vol. 16, no. 1, pp. 232-243, Feb. 2008, doi: https://doi.org/10.1109/TNET.2007.911335.   DOI
9 D. Mosberger and T. Jin, "Httperf-a tool for measuring web server performance," SIGMETRICS Performance Evaluation, vol. 26, no. 3, pp. 31-37, Dec. 1998, doi: https://doi.org/10.1145/306225.306235.   DOI
10 J. Han, D. Kim, and H. Eom, "Improving the Performance of Lustre File System in HPC Environments," 2016 IEEE 1st International Workshops on Foundations and Applications of Self* Systems (FAS*W), Augsburg, pp. 84-89, 2016, doi: https://doi.org/10.1109/FAS-W.2016.29.   DOI
11 D. Thain, T. Tannenbaum, and M. Livny, "Distributed computing in practice: the Condor experience," Concurrency and Computation: Practice and Experience, vol. 17, no. 2-4, pp. 323-356, 2005, doi: https://doi.org/10.1002/cpe.938.   DOI
12 Y. Zhang and Y. Zhao, "Astronomy in the Big Data Era," Data Science Journal, vol. 14, no. 11, 2015, doi: http://doi.org/10.5334/dsj-2015-011.   DOI
13 S. L. KIM, C. U. LEE, B. G. PARK, D. J. KIM, S. M. CHA, Y. S. LEE, C. H. HAN, M. Y. CHUN, and I. S. YUK, "KMTNET: A NETWORK OF 1.6 M WIDE-FIELD OPTICAL TELESCOPES INSTALLED AT THREE SOUTHERN OBSERVATORIES," Journal of The Korean Astronomical Society, vol. 49, no. 1, pp. 37-44, Feb. 2016, doi: https://doi.org/10.5303/JKAS.2016.49.1.37.   DOI
14 H. S. Yim, M. J. Kim, Y. H. Bae, H. K. Moon, Y. J. Choi, D. G. Roh, J. Park, and B. Moon, "DEEP-South: Automated Observation Scheduling, Data Reduction and Analysis Software Subsystem," Proceedings of the International Astronomical Union, vol. 10, no. S318, pp. 311-312, 2015, doi: https://doi.org/10.1017/S1743921315007243.   DOI
15 G. Alonso, F. Casati, H. Kuno, and V. Machiraju, Web Services: Concepts, Architectures, Applications. Springer, 2004, doi: https://doi.org/10.1007/978-3-662-10876-5_5.   DOI
16 W. Gropp, E. Lusk, and T. L. Sterling, Beowulf cluster computing with Linux, MIT Press, Cambridge, MA, 2003, doi: https://doi.org/10.7551/mitpress/1556.001.0001.   DOI