DOI QR코드

DOI QR Code

A Data Placement Scheme for the Characteristics of Data Intensive Scientific Workflow Applications

데이터 집약 과학 워크플로우 응용의 특성을 고려한 데이터 배치 기법

  • Ahn, Julim (Sookmyung Women's University Department of Computer Science) ;
  • Kim, Yoonhee (Sookmyung Women's University Department of Computer Science)
  • Received : 2018.12.10
  • Accepted : 2018.12.21
  • Published : 2018.12.31

Abstract

For data-intensive scientific workflow application experiments that leverage the cloud computing environment, large amounts of data can be distributed across multiple data centers in the cloud. The generated intermediate data can also be transmitted through access between different data centers. When the application is executed, the execution result is changed according to the location of the data since the intermediate data generated is used. However, existing data placement strategies do not consider the characteristics of scientific applications. In this paper, we define a data-intensive tasks and propose runtime data placement in that interval. Through the proposed data placement scheme, we analyze the scenarios considering the number of times in the data intensive tasks defined in this study and derive the results. In addition, performance was compared by analyzing runtime data placement times and runtime data placement overhead.

클라우드 컴퓨팅 환경을 활용한 데이터 집약적인 과학 워크플로우 응용 실험의 경우 클라우드의 여러 데이터 센터에 대량의 데이터가 분산될 수 있고, 생성되는 중간 데이터는 서로 다른 데이터 센터 간의 접근을 통해 전송될 수 있다. 또한 응용의 실행이 진행될 때, 생성된 중간 데이터를 이용하며 진행되므로 데이터의 위치에 따라 실행 결과가 달라진다. 그러나 기존의 데이터 배치 기법은 과학 응용의 특성을 고려하지 않는다. 본 논문에서는 데이터 집약적 단계를 정의하여 그 구간에서의 런타임 데이터 배치를 제안한다. 제안하는 데이터 배치 기법을 통해 본 연구에서 정의한 데이터 집약적 단계에서의 횟수를 고려한 시나리오를 분석하여 결과를 도출한다. 또한 런타임 데이터 배치 횟수와 런타임 데이터 배치 시 오버헤드를 분석하여 성능을 비교했다.

Keywords

Acknowledgement

Supported by : 한국연구재단

References

  1. CHERVENAK, Ann, et al., "Data placement for scientific applications in distributed environments," Proceedings of the 8th IEEE/ACM International Conference on Grid Computing, IEEE Computer Society, pp. 267-274, 2007.
  2. KOSAR, Tevfik, Miron, "Stork: Making data placement a first class citizen in the grid," Distributed Computing Systems 2004 Proceedings, 24th International Conference on, IEEE, pp. 342-349, 2004.
  3. SRIRAMA, Narayana, Jaagup, "Migrating scientific workflows to the cloud: through graph-partitioning, scheduling and peer-to-peer data sharing," High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC, CSS, ICESS), 2014 IEEE Intl Conf on IEEE, pp. 1105-1112, 2014.
  4. Yuan, Dong, et al., "A data placement strategy in scientific cloud workflows," Future Generation Computer Systems Vol. 26, No. 8, pp. 1200-1214, 2010. https://doi.org/10.1016/j.future.2010.02.004
  5. Alicherry, Mansoor, and Lakshman, "Optimizing data access latencies in cloud systems by intelligent virtual machine placement," INFOCOM, 2013 Proceedings IEEE, 2013.
  6. Zhao, Qing, Congcong Xiong, and Peng Wang, "Heuristic data placement for data-intensive applications in heterogeneous cloud," Journal of Electrical and Computer Engineering 2016, 2016.
  7. Yu, Jia, and Rajkumar Buyya, "A taxonomy of scientific workflow systems for grid computing," ACM Sigmod Record, Vol. 34, No. 3, pp. 44-49, 2005. https://doi.org/10.1145/1084805.1084814
  8. McCormick Jr, William T., Paul J. Schweitzer, and Thomas W. White, "Problem decomposition and data reorganization by a clustering technique," Operations Research, Vol. 20, No. 5, pp. 993-1009, 1972. https://doi.org/10.1287/opre.20.5.993