DOI QR코드

DOI QR Code

Parallel Processing of K-means Clustering Algorithm for Unsupervised Classification of Large Satellite Imagery

대용량 위성영상의 무감독 분류를 위한 K-means 군집화 알고리즘의 병렬처리

  • Han, Soohee (Dept. of Geoinformatics Engineering, Kyungil University)
  • Received : 2017.05.29
  • Accepted : 2017.06.20
  • Published : 2017.06.30

Abstract

The present study introduces a method to parallelize k-means clustering algorithm for fast unsupervised classification of large satellite imagery. Known as a representative algorithm for unsupervised classification, k-means clustering is usually applied to a preprocessing step before supervised classification, but can show the evident advantages of parallel processing due to its high computational intensity and less human intervention. Parallel processing codes are developed by using multi-threading based on OpenMP. In experiments, a PC of 8 multi-core integrated CPU is involved. A 7 band and 30m resolution image from LANDSAT 8 OLI and a 8 band and 10m resolution image from Sentinel-2A are tested. Parallel processing has shown 6 time faster speed than sequential processing when using 10 classes. To check the consistency of parallel and sequential processing, centers, numbers of classified pixels of classes, classified images are mutually compared, resulting in the same results. The present study is meaningful because it has proved that performance of large satellite processing can be significantly improved by using parallel processing. And it is also revealed that it easy to implement parallel processing by using multi-threading based on OpenMP but it should be carefully designed to control the occurrence of false sharing.

본 연구는 대용량 위성영상의 신속한 무감독 분류를 위해 k-means 군집화 알고리즘을 병렬처리하는 방법을 소개한다. K-means 군집화 알고리즘은 대표적인 무감독분류 알고리즘으로서 주로 감독분류의 전처리 단계로 활용되지만 연산 집약적이고 사용자의 개입이 적어 병렬처리의 효과를 분명하게 나타낼 수 있다. 병렬처리 코드는 OpenMP 기반의 멀티쓰레딩을 이용하여 구현하였다. 실험은 1대의 PC에서 시행하였으며 이 PC의 CPU에는 8개의 멀티코어가 집적되어 있다. 실험 영상으로는 7개 밴드로 구성한 30m 해상도의 LANDSAT 8 OLI 영상과 8개 밴드로 구성한 10m 해상도의 Sentinel-2A 영상을 사용하였다. 각각 10개 군집을 사용하여 순차처리 및 병렬처리를 수행한 결과 병렬처리가 순차처리에 비해 6배 내외의 속도를 나타내었다. 순차처리와 병렬처리 결과의 일치성 평가를 위해 각 군집의 중심값과 분류된 화소의 수를 비교하고 분류 결과 영상간 차분을 수행하였고 결과로 모든 정보가 일치하였다. 본 연구는 병렬처리를 통해 대용량 위성영상의 처리 속도를 상당히 향상시킬 수 있음을 입증하고 있다는 점에서 의미가 있다고 판단된다. 아울러 OpenMP 기반의 멀티쓰레드를 이용하면 비교적 쉽게 병렬처리를 구현할 수 있지만 false sharing의 발생을 억제하도록 코드를 설계하는데 주의를 기울여야 함도 확인할 수 있었다.

Keywords

References

  1. Clematis, A., Mineter, M., and Marciano, R. (2003), High performance computing with geographical data, Parallel Computing, Vol. 29, Issue 10, pp. 1275-1279. https://doi.org/10.1016/S0167-8191(00)00007-7
  2. Han, S.H., Heo, J., Sohn, H.G., and Yu, K. (2009), Parallel processing method for airborne laser scanning data using a PC cluster and a virtual grid, Sensors, Vol. 9, Issue 4, pp. 2555-2573. https://doi.org/10.3390/s90402555
  3. Healey, R., Dowers, S., Gittings, B., and Mineter, M.J. (1997), Parallel Processing Algorithms for GIS, CRC Press, UK.
  4. Koo, I.H. (2012), High-speed Processing of Satellite Image Using GPU, Master's thesis, Chungnam National University, Daejeon, Korea, pp. 28-42. (in Korean with English abstract)
  5. Lee, K., Jo, M., and Lee, W. (2016), Parallel processing of satellite images using CUDA library: focused on NDVI calculation, Journal of the Korean Association of Geographic Information Studies, Vol. 19, No. 3, pp. 29-42. (in Korean with English abstract) https://doi.org/10.11108/kagis.2016.19.3.029
  6. MacQueen, J. (1967), Some methods for classification and analysis of multivariate observations, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, University of California Press, Berkeley, California, USA, 21 June-18 July, pp. 281-297.
  7. OpenMP ARB (2016), The OpenMP API specification for parallel programming, OpenMP ARB, http://www. openmp.org (last date accessed: 25 May 2017).
  8. Plaza, A.J. and Chang, C. (2007), High Performance Computing in Remote Sensing, CRC Press, UK.
  9. Sugumaran, R., Hegeman, J.W., Sardeshmukh, V.B., and Armstrong, M.P. (2015), Processing remote-sensing data in cloud computing environments, In: Thenkabail, P.S. (ed.), Remotely Sensed Data Characterization, Classification, and Accuracies, CRC Press, UK, pp. 549-558.
  10. Wang, P., Wang, J., Chen, Y., and Ni, G. (2013), Rapid processing of remote sensing images based on cloud computing, Future Generation Computer Systems, Vol. 29, Issue 8, pp. 1963-1968. https://doi.org/10.1016/j.future.2013.05.002
  11. Wikipedia (2017a), Amdahl's law, Wikimedia Foundation, Inc., https://en.wikipedia.org/wiki/Amdahl%27s_law (last data accessed: 25 May 2017).
  12. Wikipedia (2017b), False sharing, Wikimedia Foundation, Inc., https://en.wikipedia.org/wiki/False_sharing (last data accessed: 25 May 2017).
  13. Wikipedia (2017c), Parallel computing, Wikimedia Foundation, Inc., https://en.wikipedia.org/wiki/Parallel_computing (last data accessed: 25 May 2017).
  14. Yang, C. and Hung, C. (2000), Parallel computing in remote sensing data processing, Proceedings of the 21st Asian Conference on Remote Sensing, ACRS, 4-8 December, Taipei, Taiwan, unpaginated CD-ROM.