DOI QR코드

DOI QR Code

Geospatial Data Pipeline to Study the Health Effects of Environments -Limitations and Solutions-

환경의 건강 영향 연구를 위한 공간지리정보 데이터 파이프라인 -자료활용의 제한점과 극복방안-

  • Won Kyung Kim (Department of Cancer AI & Digital Health, National Cancer Center Graduate School of Cancer Science and Policy) ;
  • Goeun Jung (Department of Cancer AI & Digital Health, National Cancer Center Graduate School of Cancer Science and Policy) ;
  • Dongook Son (Bulkrum Co., Ltd.) ;
  • Sun-Young Kim (Department of Cancer AI & Digital Health, National Cancer Center Graduate School of Cancer Science and Policy)
  • 김원경 (국립암센터국제암대학원대학교 암AI디지털헬스학과) ;
  • 정고은 (국립암센터국제암대학원대학교 암AI디지털헬스학과) ;
  • 손동욱 ((주)벌크럼) ;
  • 김선영 (국립암센터국제암대학원대학교 암AI디지털헬스학과)
  • Received : 2024.08.08
  • Accepted : 2024.09.04
  • Published : 2024.09.30

Abstract

Research on health outcomes of environmental factors has been implemented by multiple and interacting factors, including environmental, socio-demographic, economic, and traffic aspects. There are still significant challenges and limitations in constructing databases for the connections between contributing factors and an integrated approach to environmental health research even though there has been a dramatic increase in data availability and incredible technological advance in data storage and processing. This study emphasizes the necessity of establishing a geospatial data pipeline to analyze the impact of environmental factors on health. It also highlights the difficulties and solutions related to the construction and utilization of a geospatial database. Key challenges include diverse data sources and formats, different spatio-temporal data structures, and coordinate system inconsistencies over time within the same geospatial data. To address these issues, a data pipeline was constructed with pre-processing and post-processing for the data, resulting in refined datasets that could be used for calculating geographic variables. In addition, an AWS-based relational database and shared platform were established to provide an efficient environment for data storage and analysis. Guidelines for each step of the process, including data management and analysis, were developed to enable future researchers to effectively use the data pipeline.

환경위험요인의 건강 위험성 영향 평가에 대한 연구는 환경, 사회·인구학적 특성, 경제 및 교통 등 다양하고 상호 교차된 요인들에 대한 검토 및 분석을 통해서 이루어지지만, 이러한 요인들 간의 연계와 통합적 접근방법을 위한 데이터베이스 구축에는 자료의 활용 가능성 증가와 기술적 진보가 이루어졌음에도 불구하고 여전히 많은 문제점이 나타나고 있다. 본 연구는 환경요인의 건강에 대한 영향을 분석하기 위해 공간지리정보 데이터 파이프라인을 구축의 필요성과 함께 이를 위한 공간지리정보 데이터베이스의 구축 및 활용의 어려움과 해결방안을 제시하였다. 주요 어려움으로는 다양한 공간지리정보의 자료 출처 및 형태, 상이한 시공간적 자료구축 체계, 동일한 공간지리자료에서 시간에 따른 좌표체계 불일치 등을 포함하고 있다. 이러한 문제들을 해결하기 위해 데이터 전처리 및 후처리 과정을 통해 정제된 데이터를 구축하고, 지리변수를 계산할 수 있는 데이터 파이프라인을 구축하였다. 또한, AWS 기반의 관계형 데이터베이스 및 공유 플랫폼을 구축하여 데이터의 효율적인 저장 및 분석환경을 제공하였다. 마지막으로 각 단계별 작업과정과 자료 관리 및 분석을 위한 가이드라인을 작성하여 향후 연구자들이 효과적으로 데이터 파이프라인을 활용할 수 있는 방안을 제시하였다.

Keywords

Acknowledgement

본 연구는 2023년도 국립암센터 공익적암연구사업(NCC-2310220) 및 과학기술정보통신부 재원 한국연구재단 개인기초연구사업(No. 2022R1A2C2009971)의 지원으로 수행된 연구임.

References

  1. Berke, E.M., Koepsell, T.D., Moudon, A.V., Hoskins, R.E., E.B. Larson. 2007. Association of the built environment with physical activity and obesity in older persons. American Journal of Public Health 97(3): 486-492. https://doi.org/10.2105/AJPH.2006.085837
  2. Breuing, M., Bradley, P.E., Jahn, M., Kuper, P., Mazroob, N., Rosch, N., Al-Doori, M., Stefanakis, E. and M. Jadidi, 2020. Geospatial data management research: prgress and future directions. International Journal of Geo-Information 95: doi:10.3390/ijgi9020095.
  3. Choi, C.L., Kim, Y.L. and S.Y. Hong. 2018. Evaluating computational efficiency of spatial analysis in cloud computing platforms. Journal of the Korean Association of Geographic Information Studies 21(4): 119-131. https://doi.org/10.11108/KAGIS.2018.21.4.119
  4. Chun, M.K. and T.K. Baek. 2023. Analysis of land use characteristics using GIS DB - A case study of Busan metropolitan city in Korea. Journal of the Korean Association of Geographic Information Studies 26(3): 52-64. https://doi.org/10.11108/KAGIS.2023.26.3.052
  5. Eum, Y.S., Song, I.S. Leem, J.H. and S.Y. Kim. 2015. Computation of geographic variables for air pollution prediction models in South Korea. Environmental Health and Toxicology 30: e2015010.
  6. Goodchild. 2019. Geography and geographic information science: An evolving relationship. Canadian Geographies 63(4) :530-539. https://doi.org/10.1111/cag.12554
  7. Fontan-Vela, M., Valiente, R., Franco, M., P. Gullon. 2022. An integrated approach to create a spatial database of parks for urban health research.
  8. Hwang, H.S., Choi, J.Y. and J.E. Kang. 2023. Analysis of regional type according to spatial correspondence between heat wave vulnerable areas and health damage occurrence. Journal of the Korean Association of Geographic Information Studies 26(1): 89-113. https://doi.org/10.11108/KAGIS.2023.26.1.089
  9. Jeong, J.J. 2018. Korean population distribution characteristics through comparing census population with resident population. The Geographical Journal of Korea 52(2): 219-233.
  10. Joo, Y.J. and J.S. Cho. 2021. A study on real-tim environmental noise mapping based on AWS Cloud. Journal of the Korean Association of Geographic Information Studies 24(4): 174-183.
  11. Lee, K.J. and M.P. Kwan. 2019. The effects of GPS-based buffer size on the association between travel modes and environmental contexts. ISPRS International Journal of Geo-Information 8(11): 514. https://dio.org/10.3390/ijgi8110514.
  12. Lipovac, I. and M.B. Babac. 2024. Developing a data pipeline solution for big data processing. International Journal of Data Mining, Modelling and Management 16(1): 1-22.
  13. Liu, X., Chen, X., Tian, M. and J. De Vos. 2023. Effects of buffer size on associtions between the built environment and metro ridership: A machine learning-based sensitive analysis. Journal of Transport Geography 113: https://doi.org/10.1016/j.jtrangeo.2023.103730.
  14. Miranda, M.L. and S.E. Edwards. 2011. Use of spatial analysis to support environmental health research and practice. North Carolina Medical Journal 72(2): 228-234. https://doi.org/10.18043/ncm.72.2.132
  15. Moon, J.H., Choi, Y.R., Choi, E.J., Yang, J.E. and S.K. Park. 2024. Analysis of the effect of improving human thermal environment by road directions and street tree planting patterns in summer. Journal of the Korean Association of Geographic Information Studies 27(2): 1-18. https://doi.org/10.11108/KAGIS.2024.27.2.001
  16. Therrien, J.D., Nicolai, N. and P.A. Vanrolleghem. 2020. A critical review of the data pipeline: how wastewater system operation flows from data to intelligence. Water Science & Technology 82(12): 2613-2634. https://doi.org/10.2166/wst.2020.393