DOI QR코드

DOI QR Code

Comparison of the wall clock time for extracting remote sensing data in Hierarchical Data Format using Geospatial Data Abstraction Library by operating system and compiler

운영 체제와 컴파일러에 따른 Geospatial Data Abstraction Library의 Hierarchical Data Format 형식 원격 탐사 자료 추출 속도 비교

  • 유병현 (서울대학교 식물생산과학부) ;
  • 김광수 (서울대학교 식물생산과학부) ;
  • 이지혜 (국가농림기상센터)
  • Received : 2019.02.08
  • Accepted : 2019.03.13
  • Published : 2019.03.30

Abstract

The MODIS (Moderate Resolution Imaging Spectroradiometer) data in Hierarchical Data Format (HDF) have been processed using the Geospatial Data Abstraction Library (GDAL). Because of a relatively large data size, it would be preferable to build and install the data analysis tool with greater computing performance, which would differ by operating system and the form of distribution, e.g., source code or binary package. The objective of this study was to examine the performance of the GDAL for processing the HDF files, which would guide construction of a computer system for remote sensing data analysis. The differences in execution time were compared between environments under which the GDAL was installed. The wall clock time was measured after extracting data for each variable in the MODIS data file using a tool built lining against GDAL under a combination of operating systems (Ubuntu and openSUSE), compilers (GNU and Intel), and distribution forms. The MOD07 product, which contains atmosphere data, were processed for eight 2-D variables and two 3-D variables. The GDAL compiled with Intel compiler under Ubuntu had the shortest computation time. For openSUSE, the GDAL compiled using GNU and intel compilers had greater performance for 2-D and 3-D variables, respectively. It was found that the wall clock time was considerably long for the GDAL complied with "--with-hdf4=no" configuration option or RPM package manager under openSUSE. These results indicated that the choice of the environments under which the GDAL is installed, e.g., operation system or compiler, would have a considerable impact on the performance of a system for processing remote sensing data. Application of parallel computing approaches would improve the performance of the data processing for the HDF files, which merits further evaluation of these computational methods.

지역이나 전구 규모의 농업 생태계를 감시하기 위해 HDF 형식으로 제공되는 MODIS 원격 탐사자료가 사용되어 왔다. 대개의 경우, 다량의 영상자료들이 처리되어야 하기 때문에, 이들 자료의 처리 성능을 향상시키는 것이 유리하다. 본 연구는 HDF 파일을 처리할 수 있는 GDAL과 같은 라이브러리가 운영 체제나 배포 방식 등에 따른 처리속도의 차이를 확인하여 원격 탐사 자료 처리 시스템 구축을 지원하고자 하였다. 이를 위해, GDAL이 시스템에 설치되는 주요 조건들에 따라 MODIS 영상자료 처리 시간을 측정하고 비교하였다. 운영 체제(Ubuntu 및 openSUSE), 컴파일러(GNU 및 Intel), 설치 옵션 및 바이너리 패키지 조건을 조합하여 GDAL성능 비교가 이루어졌다. 각 조건에 따라 설치된 GDAL을 사용하여 MODIS 영상 중 대기측정 자료(MOD07)의 2차원 변수와 3차원 변수에 해당하는 총 10 종의 자료를 추출하였다. 자료처리에 소요된 구동 시간은 각 변수 값을 시스템 메모리에 저장하는 작업이 끝난 직후 측정되었다. 가장 좋은 성능을 보인 설치 조건은 Ubuntu에서 Intel Compiler를 사용하여 컴파일 된 GDAL을 사용하는 것이었다. OpenSUSE에서는 GNU와 Intel 컴파일러가 각각 2차원 자료와 3차원 자료를 처리하기 위한 작업에 효과적인 것으로 나타났다. 한편 "--with-hdf4=no" 옵션으로 컴파일 된 GDAL과 RPM package manager 버전의 GDAL의 경우, 다른 조건에 비해 상당히 낮은 성능을 보였다. 이러한 결과는 운영 체제나 컴파일러, 설치 옵션 등을 조정하여 원격 탐사자료 처리 도구의 속도를 개선할 수 있다는 것을 암시하였다. 특히, 원격 탐사 자료의 경우 다양한 형식으로 배포되므로, 이를 처리하는 라이브러리들이 최고의 성능을 발휘할 수 있는 조건을 탐색하고 이러한 결과의 공유가 후속연구에서 진행되어야 할 것으로 보인다.

Keywords

NRGSBM_2019_v21n1_65_f0001.png 이미지

Fig. 1. The first layer of Retrieved Temperature Profile of the MYD07L2 data. This MODIS image contains the observation at 05:55 on July 19 in 2017.

NRGSBM_2019_v21n1_65_f0002.png 이미지

Fig. 2. The flow chart of readGDAL_HDF function.

NRGSBM_2019_v21n1_65_f0003.png 이미지

Fig. 3. The flow chart of the R script implementedto measure the execution time for thereadGDAL HDF function.

NRGSBM_2019_v21n1_65_f0004.png 이미지

Fig. 4. The wall clock time to read 2-D variables contained within a MOD07 product data file in HDF using the readGDAL_HDF tool. The tool was built linking agianst the GDAL compiled under different operating systems, distribution package, compilers, and configuration. ubuntu and suse indicate Ubuntu and OpenSUSE operating systems, respectively. gcc and icc represent GNU and Inter compilers, respectively. Deb and rpm represent the GDAL package distributed by Ubuntu and OpenSUSE, respectively. nohdf4 denotes the --with-hdf4=no option for the compiler.

NRGSBM_2019_v21n1_65_f0005.png 이미지

Fig. 5. The wall clock time to read 3-D variables contained within a MOD07 product data file in HDF using the readGDAL_HDF tool. The tool was built linking agianst the GDAL compiled under different operating systems, distribution package, compilers, and configuration. ubuntu and suse indicate Ubuntu and OpenSUSE operating systems, respectively. gcc and icc represent GNU and Inter compilers, respectively. Deb and rpm represent the GDAL package distributed by Ubuntu and OpenSUSE, respectively. nohdf4 denotes the --with-hdf4=no option for the compiler.

Table 1. The metadata of variables contained in the MOD07 product for measurement of the processing time using the GDAL

NRGSBM_2019_v21n1_65_t0001.png 이미지

Table 2. The environment and configuration under which the GDAL was built or installed to process a remote sensing data file in HDF

NRGSBM_2019_v21n1_65_t0002.png 이미지

References

  1. Almomany, A., A. Alquraan, and L. Balachandran, 2014: GCC vs. ICC comparison using PARSEC Benchmarks. International Journal of Innovative Technology and Exploring Engineering 4(7).
  2. Andrew, M. E., M. A. Wulder, and T. A. Nelson, 2014: Potential contributions of remote sensing to ecosystem service assessments. Progress in Physical Geography 38(3), 328-353. https://doi.org/10.1177/0309133314528942
  3. Ban, H.-Y., K. S. Kim, N.-W. Park, and B.-W. Lee, 2016: Using MODIS data to predict regional corn yields. Remote Sensing 9(1), 16pp. https://doi.org/10.3390/rs9010016
  4. Cohen, W. B., and S. N. Goward, 2004: Landsat's role in ecological applications of remote sensing. Bioscience 54(6), 535-545. https://doi.org/10.1641/0006-3568(2004)054[0535:LRIEAO]2.0.CO;2
  5. Busetto, L., and L. Ranghetti, 2016: MODIStsp : An R package for automatic preprocessing of MODIS Land Products time series. Computers & Geosciences 97, 40-48. https://doi.org/10.1016/j.cageo.2016.08.020
  6. Doraiswamy, P., J. Hatfield, T. Jackson, B. Akhmedov, J. Prueger, and A. Stern, 2004: Crop condition and yield simulations using Landsat and MODIS. Remote sensing of environment 92(4), 548-559. https://doi.org/10.1016/j.rse.2004.05.017
  7. Hong, S. Y., S.-I. Na, K.-D. Lee, Y.-S. Kim, and S.-C. Baek, 2015: A study on estimating rice yield in DPRK using MODIS NDVI and rainfall data. Korean Journal of Remote Sensing 31(5), 441-448. https://doi.org/10.7780/kjrs.2015.31.5.8
  8. Jobbagy, E. G., O. E. Sala, and J. M. Paruelo, 2002: Patterns and controls of primary production in the Patagonian steppe: a remote sensing approach. Ecology 83(2), 307-319. https://doi.org/10.1890/0012-9658(2002)083[0307:PACOPP]2.0.CO;2
  9. Lee, K.-D., S.-I. Na, S.-Y. Hong, C.-W. Park, K.-H. So, and J.-M. Park, 2017: Estimating corn and soybean yield using MODIS NDVI and meteorological data in Illinois and Iowa, USA. Korean Journal of Remote Sensing 33(5), 741-750. https://doi.org/10.7780/kjrs.2017.33.5.2.13
  10. Lee, J.-H., S.-K. Kang, K.-C. Jang, J.-H. Ko, and S.-Y. Hong, 2011: The evaluation of meteorological inputs retrieved from MODIS for estimation of gross primary productivity in the US corn belt region. Korean Journal of Remote Sensing 27(4), 481-494. https://doi.org/10.7780/kjrs.2011.27.4.481
  11. Li, J., M. Humphrey, C. Van Ingen, D. Agarwal, K. Jackson, and Y. Ryu, 2010: escience in the cloud: A modis satellite data reprojection and reduction pipeline in the windows azure platform. In 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), IEEE, 1-10.
  12. Lobell, D. B., G. P. Asner, J. I. Ortiz-Monasterio, and T. L. Benning, 2003: Remote sensing of regional crop production in the Yaqui Valley, Mexico: estimates and uncertainties. Agriculture, Ecosystems & Environment 94(2), 205-220. https://doi.org/10.1016/S0167-8809(02)00021-X
  13. Mourani, G.,2001: Securing and Optimizing Linux: The Ultimate Solution. Open Network Architecture, Inc., 855pp.
  14. Prasad, A. K., L. Chai, R. P. Singh, and M. Kafatos, 2006: Crop yield estimation model for Iowa using remote sensing and surface parameters. International Journal of Applied Earth Observation and Geoinformation 8(1), 26-33. https://doi.org/10.1016/j.jag.2005.06.002
  15. Tie, B., F. Huang, J. Tao, J. Lu, and D. Qiu, 2018: A parallel and optimization approach for Land-Surface Temperature retrieval on a Windows-Based PC cluster. Sustainability 10(3), 621pp. https://doi.org/10.3390/su10030621
  16. Tristram, W., and K. Bradshaw, 2012: Performance optimisation of sequential programs on multi-core processors. Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference, Pretoria, South Africa, ACM, 119-128.
  17. Turner, D. P., W. D. Ritts, W. B. Cohen, S. T. Gower, S. W. Running, M. Zhao, M. H. Costa, A. A. Kirschbaum, J. M. Ham, S. R. Saleska, and D. E. Ahl, 2006: Evaluation of MODIS NPP and GPP products across multiple biomes. Remote Sensing of Environment 102(3-4), 282-292. https://doi.org/10.1016/j.rse.2006.02.017
  18. Vancutsem, C., P. Ceccato, T. Dinku, and S. J. Connor, 2010: Evaluation of MODIS land surface temperature data to estimate air temperature in different ecosystems over Africa. Remote Sensing of Environment 114(2), 449-465. https://doi.org/10.1016/j.rse.2009.10.002
  19. Yoo, B. H., and K. S. Kim, 2017: Development of a gridded climate data tool for the COordinated Regional climate Downscaling EXperiment data. Computers and Electronics in Agriculture 133, 128-140. https://doi.org/10.1016/j.compag.2016.12.001