[KSCI] Korea Science Citation Index Service

http://dx.doi.org/10.7319/kogsis.2017.25.1.029

Performance Comparison of Spatial Split Algorithms for Spatial Data Analysis on Spark

Yang, Pyoung Woo (School of Computer Information and Communication Engineering, Kunsan National University)
Yoo, Ki Hyun (School of Computer Information and Communication Engineering, Kunsan National University)
Nam, Kwang Woo (School of Computer Information and Communication Engineering, Kunsan National University)

Publication Information

Journal of Korean Society for Geospatial Information Science / v.25, no.1, 2017 , pp. 29-36 More about this Journal

Abstract

In this paper, we implement a spatial big data analysis prototype based on Spark which is an in-memory system and compares the performance by the spatial split algorithm on this basis. In cluster computing environments, big data is divided into blocks of a certain size order to balance the computing load of big data. Existing research showed that in the case of the Hadoop based spatial big data system, the split method by spatial is more effective than the general sequential split method. Hadoop based spatial data system stores raw data as it is in spatial-divided blocks. However, in the proposed Spark-based spatial analysis system, there is a difference that spatial data is converted into a memory data structure and stored in a spatial block for search efficiency. Therefore, in this paper, we propose an in-memory spatial big data prototype and a spatial split block storage method. Also, we compare the performance of existing spatial split algorithms in the proposed prototype. We presented an appropriate spatial split strategy with the Spark based big data system. In the experiment, we compared the query execution time of the spatial split algorithm, and confirmed that the BSP algorithm shows the best performance.

Keywords

Cluster Computing Environment; Spatial Big Data; Spark; Spatial Split Algorithm;

Citations & Related Records

Reference

1	Ahn, J. K., Yi, M. S. and Shin, D. B., 2016, Study for spatial big data concept and system building, Journal of Korea Spatial Information Society, Vol. 21, No. 5, pp. 43-51.
2	Aji, A., Vo, H. and Fusheng, W., 2015, Effective spatial data partitioning for scalable query processing, arXiv e-print, https://arxiv.org/abs/1509.00910
3	Aji, A., Vo, H., Fusheng, W., Lee, R., Zhang, X., Saltz, J., 2013, Hadoop GIS : a high performance spatial data warehousing system over mapreduce, VLDB Endowment, Vol. 6, No. 11, pp. 1009-1020. DOI
4	Eldawy, A., Alarabi, L. and Mokbel, M. F., 2015, Spatial partitioning techniques in SpatialHadoop, Proc. of 41st International Conference on Very Large Data Bases, VLDB Endowment, Hawaii, USA, pp. 1602-1605.
5	Eldawy, A. and Mokbel, M. F., 2015, SpatialHadoop: A MapReduce framework for spatial data, 2015, Proc. of IEEE 31st International conference on Data Engineering, IEEE, Seoul, Korea, pp. 1352-1363.
6	Evans, M. R., Oliver, D., Zhou, X. and Shekhar, S., 2014, Spatial big data, In:Hassan A. K. (eds.), Big data: techniques and technologies in geoinformatics, Taylor & Francis Group, UK, pp. 149-156.
7	Kim, M. S, Jang, I. S., 2016 Efficient in-memory processing for huge amounts of heterogeneous geo-sensor data, Spatial Information Research, Vol. 24, No. 3, pp. 313-322. DOI
8	Maden, S., 2012, From database to big data, IEEE Internet Computing, Vol. 16, No. 3, pp. 4-6. DOI
9	Tang, M., Yu, Y., Malluhi, Q. M. and Aref, W. G., 2015, LocationSpark: a distributed in-memory data management system for big spatial data, VLDB Endowment, Vol. 9, No. 13, pp. 1565-1568.
10	Yu, J., Wu, J., Sarwat, M., 2015, GeoSpark: a cluster computing framework for processing large-scale spatial data, Proc. of 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, ACM SIGSPATIAL, Seattle, USA, CD-ROM.
11	Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M., Shenker, S. and Stoica, I, 2012, Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing, Proc. of the 9th USENIX Symposium on networked systems design and implementation, USENIX Association, San Jose, USA, pp. 15-28.

KSCI

Performance Comparison of Spatial Split Algorithms for Spatial Data Analysis on Spark Spark 기반 공간 분석에서 공간 분할의 성능 비교

Performance Comparison of Spatial Split Algorithms for Spatial Data Analysis on Spark