Research of Performance Interference Control Technique for Heterogeneous Services in Bigdata Platform

Jin, Kisung;Lee, Sangmin;Kim, Youngkyun;

doi:10.5626/KTCP.2016.22.6.284

KIISE Transactions on Computing Practices (정보과학회 컴퓨팅의 실제 논문지)

Volume 22 Issue 6
/
Pages.284-289
/
2016
/
2383-6318(pISSN)
/
2383-6326(eISSN)

Korean Institute of Information Scientists and Engineers (한국정보과학회)

DOI QR Code

Research of Performance Interference Control Technique for Heterogeneous Services in Bigdata Platform

빅데이터 플랫폼에서 이종 서비스간 성능 간섭 현상 제어에 관한 연구

진기성 (ETRI 스토리지시스템연구실) ;
이상민 (ETRI 스토리지시스템연구실) ;
김영균 (ETRI 고성능컴퓨팅연구부)

Received : 2016.03.14
Accepted : 2016.04.20
Published : 2016.06.15

https://doi.org/10.5626/KTCP.2016.22.6.284 Citation KSCI

⟨ Previous Next ⟩

Abstract

In the Hadoop-based Big Data analysis model, the data movement between the legacy system and the analysis system is difficult to avoid. To overcome this problem, a unified Big Data file system is introduced so that a unified platform can support the legacy service as well as the analysis service. However, major challenges in avoiding the performance degradation problem due to the interference of two services remain. In order to solve this problem, we first performed a real-life simulation and observed resource utilization, workload characteristics and I/O balanced level. Based on this analysis, two solutions were proposed both for the system level and for the technical level. In the system level, we divide I/O path into the legacy I/O path and the analysis I/O path. In the technical level, we introduce an aggressive prefetch method for analysis service which requires the sequential read. Also, we introduce experimental results that shows the outstanding performance gain comparing the previous system.

Hadoop 기반의 빅데이터 분석 모델에서는 원시 데이터를 생산하는 응용계 시스템과 이를 분석하기 위한 분석계 시스템간의 데이터 이동이 불가피하다. 이에 따라, 응용 서비스와 분석 서비스를 하나의 플랫폼에서 동시에 지원할 수 있는 유니파이드 빅데이터 파일시스템 기술이 소개되고 있다. 그러나, 단일 플래폼 운영에 따른 경제성, 자원 효율성 등 다양한 측면에서의 장점에도 불구하고 현재 기술 수준에서는 응용 서비스와 분석 서비스의 상호 간섭에 의한 성능 저하 현상을 극복하는 것이 가장 큰 당면 과제로 남아있다. 본 논문에서는 이를 해결하기 위한 일차적 단계로 두 서비스에 대해 실서비스 수준 시뮬레이션을 통해 시스템 자원의 활용률, 워크로드 특성, 입출력 불균형의 세 가지 관점에서 관찰한 후 성능 간섭 문제의 근본적인 원인을 도출하였다. 또한 이를 해결하기 위한 방법으로 첫째, 데이터 서버의 입출력 경로를 분리하여 응용 서비스와 분석 서비스 각각 독립적인 입출력 계층을 구성하는 구조적인 해결책과, 둘째, 순차 읽기 특성을 가지는 분석 서비스 입출력 특성의 효과를 극대화하기 위한 선제적 미리 읽기 기법의 기술적 해결책을 제안한다. 한편, 논문에서 제안한 방법의 효과를 검증하기 위해 시뮬레이션과 동일한 방법의 시험을 기존 시스템과 제안한 시스템 각각에 대해 수행한 결과 기존 시스템 대비 우수한 성능을 확인할 수 있었다.

Keywords

Acknowledgement

Grant : 듀얼 모드 배치-쿼리 분석을 제공하는 빅데이터 플랫폼 핵심 기술 개발

Supported by : ETRI

References

Borthakur, Dhruba, "The hadoop distributed file system: Architecture and design," Hadoop Project Website, Nov. 2007.
Rabl, Tilmann, and Hans-Arno Jacobsen, "Big data generation," Specifying Big Data Benchmarks, Springer Berlin Heidelberg, pp. 20-27, 2014.
[Online]. Available: https://www.mapr.com
Kim Young Chang, et al., "MAHA-FS: A Distributed File System for High Performance Metadata Processing and Random IO," KIPS Transactions on Software and Data Engineering, Vol. 2, No. 2, pp. 91-96, 2013. https://doi.org/10.3745/KTSDE.2013.2.2.091
Hong Yeon Kim. "GLORY-FS: A scale-out storage infrastructure for large scale server virtualization and cloud services," VIOPS, 2012.
Choi, Hyunsik, et al., "Tajo: A distributed data warehouse system on large clusters," 29th IEEE International Conference on Data Engineering (ICDE), 2013.