Browse > Article
http://dx.doi.org/10.7472/jksii.2018.19.1.77

Design and Implementation of an Execution-Provenance Based Simulation Data Management Framework for Computational Science Engineering Simulation Platform  

Ma, Jin (Dept. of Scientific Platform Development, Korea Institute of Science and Technology Information (KISTI))
Lee, Sik (Dept. of Scientific Platform Development, Korea Institute of Science and Technology Information (KISTI))
Cho, Kum-won (Dept. of Scientific Platform Development, Korea Institute of Science and Technology Information (KISTI))
Suh, Young-kyoon (School of Computer Science & Engineering, Kyungpook National University)
Publication Information
Journal of Internet Computing and Services / v.19, no.1, 2018 , pp. 77-86 More about this Journal
Abstract
For the past few years, KISTI has been servicing an online simulation execution platform, called EDISON, allowing users to conduct simulations on various scientific applications supplied by diverse computational science and engineering disciplines. Typically, these simulations accompany large-scale computation and accordingly produce a huge volume of output data. One critical issue arising when conducting those simulations on an online platform stems from the fact that a number of users simultaneously submit to the platform their simulation requests (or jobs) with the same (or almost unchanging) input parameters or files, resulting in charging a significant burden on the platform. In other words, the same computing jobs lead to duplicate consumption computing and storage resources at an undesirably fast pace. To overcome excessive resource usage by such identical simulation requests, in this paper we introduce a novel framework, called IceSheet, to efficiently manage simulation data based on execution metadata, that is, provenance. The IceSheet framework captures and stores each provenance associated with a conducted simulation. The collected provenance records are utilized for not only inspecting duplicate simulation requests but also performing search on existing simulation results via an open-source search engine, ElasticSearch. In particular, this paper elaborates on the core components in the IceSheet framework to support the search and reuse on the stored simulation results. We implemented as prototype the proposed framework using the engine in conjunction with the online simulation execution platform. Our evaluation of the framework was performed on the real simulation execution-provenance records collected on the platform. Once the prototyped IceSheet framework fully functions with the platform, users can quickly search for past parameter values entered into desired simulation software and receive existing results on the same input parameter values on the software if any. Therefore, we expect that the proposed framework contributes to eliminating duplicate resource consumption and significantly reducing execution time on the same requests as previously-executed simulations.
Keywords
Computational Science Engineering Platform; EDISON Platform; Simulation; Data; Search Engine; Open Science; Provenance;
Citations & Related Records
Times Cited By KSCI : 3  (Citation Analysis)
연도 인용수 순위
1 Jin Ma, Jerry Seo, Jong Suk Ruth Lee and Minjae Park, "Implementation and Application of the EDISON platform's integrated file management service," Journal of Internet Computing and Services (JICS), Vol.17, No.6, pp.71-79, 2016. http://dx.doi.org/10.7472/jksii.2016.17.6.71   DOI
2 Jin Ma, Jongsuk Ruth Lee, Kumwon Cho and Minjae Park, "Design and Implementation of Information Management Tools for the EDISON Open Platform," KSII Transactions on Internet and Information Systems, Vol. 11, No. 2, pp. 1089-1104, 2017. https://doi.org/10.3837/tiis.2017.02.026   DOI
3 Young-kyoon Suh, Kum won Cho, "Construction and Service of a Web-based Cyber-learning Platform for the Computational Science and Engineering Community in Korea", Journal of Internet Computing and Services (JICS), Vol.17,No.4, pp.115-125, 2016. http://doi.org/10.7472/jksii.2016.17.4.115   DOI
4 OECD, "Making Open Science a Reality", Oct, 15, 2015. http://www.oecd-ilibrary.org/science-and-technology/oecd-science-technology-and-industry-policy-papers_23074957
5 EDISON(EDucation-research Integration through Simulation On the Net), http://edison.re.kr
6 Liferay, https://www.liferay.com/
7 Spring, https://projects.spring.io/spring-framework/
8 Fielding, Roy Thomas, "Chapter 5: Representational State Transfer (REST)," Architectural Styles and the Design of Network-based Software Architectures (Ph.D.), University of California, Irvine, 2000.
9 Jin Ma, Young-Kyoon Suh, Jong-Suk Ruth Lee, "Design of Data Model for Execution-Provenance Management in an Online HPC Simulation Platform", in Proc. of KSII Fall Conference, Vol.17, No.2, pp.153-154, 2016.
10 Fielding, Roy Thomas, Richard N. Taylor, "Principled Design of the Modern Web Architecture," ACM Transactions on Internet Technology, Vol. 2, No. 2, May 2002, pp.115-150, ISSN 1533-5399, 2002. http://dx.doi.org/10.1145/514183.514185   DOI
11 Github-elastic, https://github.com/elastic/elasticsearch
12 Apache License Version 2.0, January, 2004. http://www.apache.org/licenses/LICENSE-2.0.html
13 Creative Commons(CC), https://creativecommons.org/about/program-areas/open-science/
14 Open Science Commons(OSC), https://www.opensciencecommons.org/
15 Jin Ma, Young-Kyoon Suh, "Design and Development of Data Search Engine for Computational Science Engineering Simulation Platform", in Proc. of KSII Spring Conference, pp.87-88, 2017.
16 Java Database Connection(JDBC)importer for Elasticsearch, https://github.com/jprante/elasticsearch-jdbc
17 Elasticsearch, https://www.elastic.co/products/elasticsearch
18 JSON, https://json.org