DOI QR코드

DOI QR Code

하둡 및 Spark 기반 공간 통계 핫스팟 분석의 분산처리 방안 연구

Distributed Processing Method of Hotspot Spatial Analysis Based on Hadoop and Spark

  • 김창수 (포스웨이브 기업부설연구소) ;
  • 이주섭 (포스웨이브 기업부설연구소) ;
  • 황규문 (포스웨이브 기업부설연구소) ;
  • 성효진 (포스웨이브 기업부설연구소)
  • 투고 : 2017.07.28
  • 심사 : 2017.11.22
  • 발행 : 2018.02.15

초록

공간통계 분석중 하나인 핫스팟 분석은 "인접해 있는 것은 멀리 있는 것 보다 더 연관성이 있다"는 법칙에 따라 공간속성이나 사건의 공간 패턴을 쉽게 파악할 수 있는 기법 중 하나 이지만, 공간의 인접성이 고려되어야 하므로 분산 처리하기 용이하지 않다. 본 논문에서는 핫스팟 분석의 분산처리 방안을 기술하고 성능을 하둡 및 인메모리 기반인 Spark으로 평가한 결과 단일 시스템 대비 하둡기반 처리는 625.89%, Spark기반 처리는 870.14%의 성능향상을 확인하였으며, 하둡 기반과 Spark기반의 비교에서는 대용량 데이터 셋을 처리 할수록 Spark기반의 성능향상율이 높아짐을 확인하였다.

One of the spatial statistical analysis, hotspot analysis is one of easy method of see spatial patterns. It is based on the concept that "Adjacent ones are more relevant than those that are far away". However, in hotspot analysis is spatial adjacency must be considered, Therefore, distributed processing is not easy. In this paper, we proposed a distributed algorithm design for hotspot spatial analysis. Its performance was compared to standalone system and Hadoop, Spark based processing. As a result, it is compare to standalone system, Performance improvement rate of Hadoop at 625.89% and Spark at 870.14%. Furthermore, performance improvement rate is high at Spark processing than Hadoop at as more large data set.

키워드

과제정보

연구 과제번호 : 국토공간정보의 빅데이터 관리, 분석 및 서비스 플랫폼 기술 개발

연구 과제 주관 기관 : 국토교통과학기술진흥원

참고문헌

  1. Cisco. Cisco Mobile Visual Networking Index, Cisco, Jun. 2017.
  2. Franklin, Carl and Paula Hane, "An introduction to GIS: linking maps to databases," Database, Vol.15, No. 2, pp. 17-22, Apr. 1992.
  3. W.R. Tobler, "A Computer Movie Simulating Urban Growth in the Detroit Region," Economic Geography, Vol. 46, pp. 234-240, 1970. https://doi.org/10.2307/143141
  4. Kim Changsoo, "Hadoop based Spatial Bigdata Index Creation and Processing," Korea Computer Congress, pp. 87-89, Jun. 2016.
  5. H. Kang, "Hotspot Analysis: Basic of Spatial Analysis, Understanding and utilization to Closet Grouping Analysis and Local Moran I," PLANNING AND POLICY, pp. 116-121, Oct. 2008.
  6. [Online]. Available: http://hadoop.apache.org/
  7. [Online]. Available:http://spark.apache.org/
  8. Ablimit Aji and et. al., "Hadoop GIS: a high performance spatial data warehousing system over mapreduce," Proc. VLDB Endowment, Vol. 6, No. 11, Aug. 2013.
  9. Ahmed Eldawy and Mohamed F. Mokbel, "Spatial Hadoop: A MapReduce Framework for Spatial Data," Proc. of the IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, Apr. 2015.
  10. Anthony Fox, Chris Eichelberger, James Hughes, Skylar Lyon, "Spatio-temporal Indexing in Non-relational Distributed Databases," IEEE Big Data Conference 2013, Santa Clara, CA, 2013.
  11. Geomesa, [Online]. Available: http://www.geomesa.org
  12. Geohash, [Online]. Available: http://geohash.org/site/tips.html
  13. Jason Long, "GIS Tools for Hadoop: Big Data Spatial Analytics for the Hadoop Framework," Esri blog, http://esri.github.io/gis-tools-for-hadoop/
  14. How Hot Spot Analysis (Getis-Ord Gi*) works, [Online]. Available: http://desktop.arcgis.com/en/arcmap/10.3/tools/spatial-statistics-toolbox/h-how-hot-spot-analysis-getis-ord-gi-spatial-stati.htm
  15. Biz-gis XsDB, [Online]. Available: http://www.biz-gis.com/XsDB/
  16. Geovision, [Online]. Available: http://www.geovision.co.kr/
  17. Cluster and outlier analysis, [Online]. Available: http://pro.arcgis.com/en/pro-app/tool-reference/spatial-statistics/cluster-and-outlier-analysis-anselin-local-moran-s.htm
  18. Hotspot Analysis, [Online]. Available: http://desktop.arcgis.com/en/arcmap/10.3/tools/spatial-statistics-toolbox/optimized-hot-spot-analysis.htm
  19. Spatial Autocorrelation, [Online]. Available: http://pro.arcgis.com/en/pro-app/tool-reference/spatial-statistics/spatial-autocorrelation.htm
  20. Incremental Spatial Autocorrelation, [Online]. Available: http://desktop.arcgis.com/en/arcmap/10.3/tools/spatial-statistics-toolbox/incremental-spatial-autocorrelation.htm