Spatial Computation on Spark Using GPGPU

Son, Chanseung;Kim, Daehee;Park, Neungsoo;

doi:10.3745/KTCCS.2016.5.8.181

KIPS Transactions on Computer and Communication Systems (정보처리학회논문지:컴퓨터 및 통신 시스템)

Volume 5 Issue 8
/
Pages.181-188
/
2016
/
2287-5891(pISSN)
/
2734-049X(eISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

Spatial Computation on Spark Using GPGPU

GPGPU를 활용한 스파크 기반 공간 연산

손찬승 (건국대학교 컴퓨터공학과) ;
김대희 (건국대학교 컴퓨터공학과) ;
박능수 (건국대학교 컴퓨터공학과)

Received : 2016.07.25
Accepted : 2016.08.03
Published : 2016.08.31

https://doi.org/10.3745/KTCCS.2016.5.8.181 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Recently, as the amount of spatial information increases, an interest in the study of spatial information processing has been increased. Spatial database systems extended from the traditional relational database systems are difficult to handle large data sets because of the scalability. SpatialHadoop extended from Hadoop system has a low performance, because spatial computations in SpationHadoop require a lot of write operations of intermediate results to the disk, resulting in the performance degradation. In this paper, Spatial Computation Spark(SC-Spark) is proposed, which is an in-memory based distributed processing framework. SC-Spark is extended from Spark in order to efficiently perform the spatial operation for large-scale data. In addition, SC-Spark based on the GPGPU is developed to improve the performance of the SC-Spark. SC-Spark uses the advantage of the Spark holding intermediate results in the memory. And GPGPU-based SC-Spark can perform spatial operations in parallel using a plurality of processing elements of an GPU. To verify the proposed work, experiments on a single AMD system were performed using SC-Spark and GPGPU-based SC-Spark for Point-in-Polygon and spatial join operation. The experimental results showed that the performance of SC-Spark and GPGPU-based SC-Spark were up-to 8 times faster than SpatialHadoop.

최근 급격히 증가하는 공간 데이터를 효율적으로 처리하기 위해 많은 연구들이 진행되고 있다. 기존 관계형 데이터베이스 시스템을 확장한 공간 데이터베이스 시스템은 확장성에 대한 문제가 있으며, 분산 처리 플랫폼인 하둡을 확장한 SpatialHadoop은 중간 연산 결과를 디스크에 작성하기 때문에 파일 입출력의 오버헤드로 성능이 저하되는 문제가 있다. 본 논문은 인-메모리 기반 분산 처리 프레임워크인 스파크를 확장한 공간 연산 스파크를 제안하였다. 또한 공간 연산 스파크의 성능을 향상시키기 위하여 GPGPU를 결합한 모델을 개발하였다. 공간 연산 스파크는 중간 연산 결과를 메모리에 유지시키는 스파크의 특징을 그대로 사용하고 있으며, GPGPU 기반 공간 연산 스파크의 경우 다수의 PE를 이용하여 병렬처리하기 때문에 효율적으로 공간 연산을 수행할 수 있다. 본 논문은 단일 AMD 시스템에서 공간 연산 스파크와 GPGPU 기반 공간 연산 스파크를 구현하였다. 공간 연산 스파크와 GPGPU 기반 공간 연산 스파크의 성능을 평가하기 위하여 Point-in-Polygon 연산과 Spatial Join 연산을 수행하였으며, SpatialHadoop에 비하여 최대 8배의 성능 향상을 확인하였다.

Keywords

References

A. Eldawy and Mohamed F. Mokbel, "SpatialHadoop: A MapReduce Framework for Spatial Data," 2015 IEEE 31st International Conference on Data Engineering (ICDE), pp. 1352-1363, Apr., 2015.
A. Aji, F. Wang, H. Vo, R. Lee, Q. Liu, X. Zhang, and J. H. Saltz, "Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce," PVLDB, pp.1009-1020, 2013.
M. Zaharia, M. Chowdhury, Michael J. Franklin, S. Shenker, and I. Stoica, "Spark: Cluster Computing with Working Sets," Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing, pp.10-10, 2010.
W. Tom, "Hadoop The Definitive Guide," O'Reilly Media, 2009.
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, Michael. J. Franklin, S. Shenker, and I. Stoica, "Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing," Proc. of the 9th USENIX Conference on Networked Systems Design and Implementation, pp.2-2, 2012.
Open Geospatial Consortium, Inc, "The OpenGIS Abstraction Specification Topic 5: Features, Version 5.0," 2009 [Internet], http://www.opengeospatial.org/docs/as.
J. Kalojanov and P. Slusallek, "A Parallel Algorithm for Construction of Uniform Grids," in Proceedings of High Performance Graphics, pp. 23-28, 2009.
T. Kaldewey, G. Lohman, R. Mueller, and P. Volk, "GPU Join Processing Revisited," in Proceedings of the Eighth International Workshop on Data Management on New Hardware (DaMoN 2012), pp.55-62, 2012.
B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander, "Relational Joins on Graphics Processors," in Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp.511-524, 2008.
H. Samet, "Spatial Data Structures," in Modern Database Systems: The Object Model, Interoperability and Beyond, W. Kim, Ed., pp.361-385, Addison Wesley/ACM, pp.361-385, 1995.
S. You, J. Zhang, and L. Gruenwald, "Large-Scale Spatial Join Query Processing in Cloud," in Proceedings of IEEE CloudDM'15, pp.34-41, 2015.
O. Segal, P. Colangelo, N. Nasiri, Z. Qian, and M. Margala, "SparkCL: A Unified Programming Framework for Accelerators on Heterogeneous Clusters," arXiv:1505.01120, 2015.
JinYung Hong, MyungJoong Jeon, YoungTack Park, "Scalable Ontology Reasoning Using GPU Cluster Approach," Journal of KIISE, Vol.43, No.1, pp.61-70, 2016. https://doi.org/10.5626/JOK.2016.43.1.61