DOI QR코드

DOI QR Code

Predicting Forest Fires Using Machine Learning Considering Human Factors

인적요인을 고려한 머신러닝 활용 산림화재 예측

  • 장진명 (인하대학교 물류전문대학원) ;
  • 김주찬 (CJ대한통운 데이터 솔루션그룹) ;
  • 김화중 (인하대학교 아태물류학부) ;
  • 김광태 (LG CNS Analytics&Optimization컨설팅팀)
  • Received : 2023.09.09
  • Accepted : 2023.10.09
  • Published : 2023.10.30

Abstract

Early detection of forest fires is essential in preventing large-scale forest fires. Predicting forest fires serves as a vital early detection method, leading to various related studies. However, many previous studies focused solely on climate and geographic factors, overlooking human factors, which significantly contribute to forest fires. This study aims to develop forest fire prediction models that take into account human, weather and geographical factors. This study conducted a comparative analysis of four machine learning models alongside the logistic regression model, using forest fire data from Gangwon-do spanning 2003 to 2020. The results indicate that XG Boost models performed the best (AUC=0.925), closely followed by Random Forest (AUC=0.920), both of which are machine learning techniques. Lastly, the study analyzed the relative importance of various factors through permutation feature importance analysis to derive operational insights. While meteorological factors showed a greater impact compared to human factors, various human factors were also found to be significant.

대형 산림화재를 예방하기 위해 산림화재의 조기발견은 매우 중요하다. 조기발견을 위한 하나의 방안으로 산림화재 발생 예측이 고려되고 있으며 다양한 관련 연구가 진행되었다. 그러나 대다수의 선행연구가 산림화재의 주요 발화 원인 중의 하나인 인적요인을 고려하지 않고 기상요인과 지리적 요인만을 주로 다루고 있다. 따라서 본 연구는 기상 및 지리적 요인뿐만 아니라 인적요인을 고려한 산림화재 예측모형을 개발하기 위해 2003년부터 2020년까지의 강원도 산림화재 데이터를 활용하여 로지스틱 회귀모형과 다양한 머신러닝 기법 기반의 예측모형을 개발하고 성능을 비교분석하였다. 성능분석 결과, 머신러닝 기법인 랜덤 포레스트(AUC=0.920)와 XG Boost 모형(AUC=0.925)이 가장 우수한 성능을 나타냈다. 운영시사점을 도출하기 위해 순열특성중요도 분석을 활용하여 요인들의 상대적 중요도를 분석하였으며, 기상요인이 인적요인보다 높은 영향도를 나타냈지만 다양한 인적요인도 유효한 것으로 확인되었다.

Keywords

Acknowledgement

이 논문은 2020년 대한민국 교육부와 한국연구재단의 인문사회분야 중견연구자지원사업의 지원을 받아 수행된 연구임(NRF-2020S1A5A2A01045577). Q-GIS를 활용하여 지도를 그려준 이성민 학부학생에게 감사를 표함.

References

  1. An, S. H., Lee, S. Y., Won, M. S., Lee, M. B. and Shin, Y. C. (2004). Developing the Forest Fire Occurrence Probability Model Using GIS and Mapping Forest Fire Risks, Journal of the Korean Association of Geographic Information Studies, 7(4), 57-64.
  2. Arndt, N., Vacik, H., Koch, V., Arpaci, A. and Gossow, H. (2013). Modeling Human-caused Forest Fire Ignition for Assessing Forest Fire Danger in Austria, iForest-Biogeosciences and Forestry, 6(6), 315. https://doi.org/10.3832/ifor0936-006.
  3. Atkinson, P. M. and Massari, R. (1998). Generalized Linear Modeling of Susceptibility to Landsliding in the Central Apennines, Italy, Computer & Geosciences, 24(4), 373-385. https://doi.org/10.1016/S0098-3004(97)00117-9
  4. Batista, G. E., Prati, R. C. and Monard, M. C. (2004). A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data, ACM SIGKDD Explorations Newsletter, 6(1), 20-29. https://doi.org/10.1145/1007730.1007735.
  5. Biau, G. and Scornet, E. (2016). A Random Forest Guided Tour, Test, 25(2), 197-227. https://doi.org/10.1007/s11749-016-0481-7.
  6. Biau, G., Devroye, L. and Lugosi, G. (2008). Consistency of Random Forest and Other Averaging Classifiers, Journal of Machine Learning Research, 9, 2015-2033.
  7. Bishop, C. M. (2006). Pattern Recognition and Machine learning, Berlin, Springer.
  8. Breiman, L. (2001). Random Forests, Machine learning, 45, 5-32. https://doi.org/10.1023/A:1010933404324
  9. Breiman, L. (2002). Manual on Setting up, Using, and Understanding Random Forests, California, Berkeley: Statistics Department University of California Berkeley.
  10. Calef, M. P., McGuire, A. D. and Chapin III, F. S. (2008). Human Influences on Wildfire in Alaska from 1988 through 2005: An Analysis of the Spatial Patterns of Human Impacts, Earth Interactions, 12(1), 1-17. https://doi.org/10.1175/2007EI220.1.
  11. Chae, H. M., Um, G. J. and Lee, S. Y. (2011). The Vulnerability Assessment of Forest Fire in Gangwon Province Using CCGIS, Korean Society of Hazard Mitigation, 11(4), 123-30. https://doi.org/10.9798/KOSHAM.2011.11.4.123
  12. Chae, J. S., Kim B. K., Lee, J. H. and Lee, S. Y. (2019). A Study on Mitigation of Facilities Damage Caused by Forest Fire, Journal of Wellness, 14, 39-51. http://dx.doi.org/10.21097/ksw.2019.08.14.3.39.
  13. Chae, K. J., Lee, Y. L., Cho, Y. J. and Park, J. H. (2018). Development of a Gangwon Province Forest Fire Prediction Model Using Machine Learning and Sampling, The Korea Journal of BigData, 3(2), 71-78. https://doi.org/10.36498/kbigdt.2018.3.2.71.
  14. Chatterjee, S. and Hadi, A. S. (2015). Regression Analysis by Example, New Jersey, John Wiley & Sons.
  15. Chawla, N. V., Bowyer, K. W., Hall, L. O. and Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique, Journal of artificial intelligence research, 16, 321-357. https://doi.org/10.1613/jair.953.
  16. Chen, T. and Guestrin, C. (2016). Xgboost: A Scalable Tree Boosting System, Proceeding of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, Aug. 13-17, San Francisco California, USA.
  17. Choi, S. and Rho, J. H. (2015). Development and Implementation of a 2-Phase Calibration Method for Gravity Model Considering Accessibility, Journal of Korean Society of Transportation, 33(4), 393-404. https://doi.org/10.7470/jkst.2015.33.4.393.
  18. Cook, N. R. (2008). Statistical Evaluation of Prognostic versus Diagnostic Models: Beyond the ROC Curve, Clinical chemistry, 54(1), 17-23. https://doi.org/10.1373/clinchem.2007.096529.
  19. Cortes, C. and Vapnik, V. (1995). Support-vector Networks, Machine learning, 20(3), 273-297. https://doi.org/10.1007/BF00994018
  20. Dicky, J. W. (2018). Metropolitan Transportation Planning, London, Routledge.
  21. Du, M., Yu, Z., Wang, T., Wang, X. and Jiang, X. (2020). XGBoost Based Strategic Consumers Classification Model on E-commerce Platform, Proceeding of the 6th International Conference on E-Business and Applications, Feb. 25-27, Kuala Lumpur, Malaysia.
  22. Fisher, A., Rudin, C. and Dominici, F. (2019). All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously, Journal of Machine Learning Research, 20(177), 1-81.
  23. Gudmundsson, L., Rego, F. C., Rocha, M. and Seneviratne, S. I. (2014). Predicting above Normal Wildfire Activity in Southern Europe as a Function of Meteorological Drought, Environmental Research Letters, 9(8), 084008. https://doi.org/10.1088/1748-9326/9/8/084008.
  24. Hah, D. W., Kim, Y. M. and Ahn, J. J. (2019). A Study on KOSPI 200 Direction Forecasting Using XGBoost Model, The Korean Data & Information Science Society, 30(3), 655-669. https://doi.org/10.7465/jkdi.2019.30.3.655.
  25. He, H., Bai, Y., Garcia, E. A. and Li, S. (2008). ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning, In 2008 IEEE International Joint Conference on Neural Networks, Jun. 1-8, Hong Kong, China, pp. 1322-1328.
  26. Ij, H. (2018). Statistics versus Machine Learning, Nature Methods, 15(4), 233. https://doi.org/10.1038/nmeth.4642.
  27. Ivan, T. (1976). Two Modifications of CNN, IEEE transactions on Systems, Man and Communications, SMC, 6, 769-772. https://doi.org/10.1109/TSMC.1976.4309452.
  28. Kang, S. A., Kim, S. H. and Ryu, M. H. (2022), Analysis of Hypertension Risk Factors by Life Cycle Based on Machine Learning. Journal of the Korea Industrial Information Systems Research, 27(5), 73-82. http://dx.doi.org/10.9723/jksiis.202 2.27.5.073.
  29. Kim, Y. H., Kong, I. H., Chung, C. Y., Shin, I. Cheong S., Jung. W. C., Mo, H. S., Kim, S. I. and Lee, Y. W. (2019). Wildfire Risk Index Using NWP and Satellite Data: Its Development and Application to 2019 Kangwon Wildfires, Korean Journal of Remote Sensing, 35(2), 337-342. https://doi..org/10.7780/kjrs.2019.35.2.12.
  30. Kim, K. M., Jang, H. Y. and Zhang, B. T. (2014). Oversampling-based Ensemble Learning Methods for Imbalanced Data, KI ISE Transactions on Computiong Practices, 20(1), 549-554. https://doi.org/10.5626/KTCP.2014.20.10.549
  31. Korea Forest Service. (2017). The 6th Basic Forest Policies 2018-2037, Daejeon, Korea Forest Service.
  32. Korea Forest Service. (2020). Comprehensive Measurements to Prevent Forest Fires Nationwide in 2020, Daejeon, Korea Forest Service.
  33. Korea Forest Service. (2022). Forest fire occurrence status, Daejeon, Korea Forest Service.
  34. Kwak, H. B., Lee, W. K., Lee, S. Y., Won, M. S., Koo, K. S., Lee, B. D. and Lee, M. B. (2010). Cause-specific Spatial Point Pattern Analysis of Forest Fire in Korea, Journal of Korean Forest Society, 99, 259-266.
  35. Lee, D., Byun, K., Lee, H. and Shin, S. (2023). The Prediction of Survival of Breast Cancer Patients Based onMachine Learning Using Health Insurance Claim Data, Journal of the Korea Industrial Information Systems Research, 28(2), 1-9. http://dx.doi.org/10.9723/jksiis.2023.28.2.001.
  36. Lee, S. Y., Han, S. Y., Won, M. S., An, S. H. and Lee, M. B. (2004). Developing of Forest Fire Occurrence Probability Model by Using the Meteorological Characteristics in Korea, Korean Journal of Agricultural and Forest Meteorology, 6(4), 242-249.
  37. Lee, W. C., Kim, Y. S., Kim, J. M. and Lee, C. K. (2020). Forecasting of Iron Ore Prices Using Machine Learning, Journal of the Korea Industrial Information Systems Research, 25(2), 57-72. http://dx.doi.org./10.9723/jksiis.2020.25.2.057.
  38. Lee, Y. S. and Sang, H. S. (2012). Problems and Improvement Measures for Extinguishing Wildfires, Journal of International Studies, 18, 97-132.
  39. Levy, J. J. and O'Malley, A. J. (2020). Don't Dismiss Logistic Regression: the Case for Sensible Extraction of Interactions in the Era of Machine Learning, BMC Medical Research Methodology, 20(1), 1-15. https://doi.org/10.1186/s12874-020-01046-3.
  40. Liang, Y., Wu, J., Wang, W., Cao, Y., Zhong, B., Chen, Z. and Li, Z. (2019). Product Marketing Prediction Based on XGboost and LightGBM Algorithm, Proceeding of the 2nd International Conference on Artificial Intelligence and Pattern Recognition, Aug. 16-18, Beijing, China.
  41. Martin, Y., Zuniga-Anton, M. and Rodrigues Mimbrero, M. (2019). Modelling Temporal Variation of Fire-occurrence towards the Dynamic Prediction of Human Wildfire Ignition Danger in Northeast Spain, Geomatics, Natural Hazards and Risk, 10(1), 385-411. https://doi.org/10.1080/19475705.2018.1526219.
  42. McCune, B. and Grace, J. (2002). Analysis of Ecological Communities, Ohio, MJM Software Design.
  43. Menard, S. (2002). Applied Logistic Regression Analysis, California, Sage.
  44. Mohammed, R., Rawashdeh, J. and Abdullah, M. (2020). Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results, Proceeding of the 11th International Conference on Information and Communication Systems, Apr. 7-9, Irbid, Jordan.
  45. Noriega, L. (2005). Multilayer Perceptron Tutorial. School of Computing, Stoke-on-Trent, Staffordshire University.
  46. Pham, B. T., Jaafari, A., Avand, M., Al-Ansari, N., Dinh Du, T., Yen, H. P. H., Phong, T. V., Nguyen, D. H., Le, H. V., Mafi-Gholami, D., Prakash, I., Thuy, H. T. and Tuyen, T. T. (2020). Performance Evaluation of Machine Learning Methods for Forest Fire Modeling and Prediction, Symmetry, 12(6), 1022. https://doi.org/10.3390/sym12061022.
  47. Piao, Y., Lee, D., Park, S., Kim, H. G. and Jin, Y. (2022). Forest Fire Susceptibility Assessment Using Google Earth Engine in Gangwon-do, Republic of Korea, Geomatics, Natural Hazards and Risk, 13(1), 432-450. https://doi.org/10.1080/19475705.2022.2030808.
  48. Preisler, H. K. and Westerling, A. L. (2007). Statistical Model for Forecasting Monthly Large Wildfire Events in Western United States, Journal of Applied Meteorology and Climatology, 46(7), 1020-1030. https://doi.org/10.1175/JAM2513.1.
  49. Rodrigues, M. and de la Riva, J. (2014). An Insight into Machine-learning Algorithms to Model Human-caused Wildfire Occurrence, Environmental Modelling & Software, 57, 192-201. https://doi.org/10.1016/j.envsoft.2014.03.003.
  50. Romero-Calcerrada, R., Novillo, C. J., Millington, J. D. and Gomez-Jimenez, I. (2008). GIS Analysis of Spatial Patterns of Human-caused Wildfire Ignition Risk in the SW of Madrid (Central Spain), Landscape ecology, 23(3), 341-354. https://doi.org/10.1007/s10980-008-9190-2.
  51. Sadasivuni, R., Cooke, W. H. and Bhushan, S. (2013). Wildfire Risk Prediction in Southeastern Mississippi Using Population Interaction, Ecological Modelling, 251, 297-306. https://doi.org/10.1016/j.ecolmodel.2012.12.024.
  52. Siroky, D. S. (2009). Navigating and Random Forest and Related Advances in Algorithmic Modeling, Statistics Survey, 3, 147-163. https://doi.org/10.1214/07-SS033.
  53. Tariq, A., Shu, H., Siddiqui, S., Munir, I., Sharifi, A., Li, Q. and Lu, L. (2022). Spatio-temporal Analysis of Forest Fire Events in the Margalla Hills, Islamabad, Pakistan Using Socio-economic and Environmental Variable Data with Machine Learning Methods, Journal of Forestry Research, 33(1), 183-194. https://doi.org/10.1007/s11676-021-01354-4.
  54. Vilar, L., Woolford, D. G., Martell, D. L. and Martin, M. P. (2010). A Model for Predicting Human-caused Wildfire Occurrence in the Region of Madrid, Spain, International Journal of Wildland Fire, 19(3), 325-337. https://doi.org/10.1071/WF09030.
  55. Wilson, D. L. (1972). Asymptotic Properties of Nearest Neighbor Rules Using Edited Data, IEEE Transactions on Systems, Man, and Cybernetics, 3, 408-421. https://doi.org/10.1109/TSMC.1972.4309137.
  56. Won, M. S., Jang, K. C. and Yoon, S. H. (2018). Development of Fire Weather Index Model in Inaccessible Areas Using MOD14 Fire Product and 5㎞-resolution Meteorological Data, Journal of the Korean Association of Geographic Information Studies, 21(3), 189-204. https://doi.org/10.11108/kagis.2018.21.3.189.
  57. Won, M. S., Koo, K. S. and Lee, M. B. (2006). An Analysis of Forest Fire Occurrence Hazards by Changing Temperature and Humidity of Ten-day Intervals for 30 Years in Spring, Korean Journal of Agricultural and Forest Meteorology, 8(4), 250-259.
  58. Won, M. S., Lee, M. B., Lee, W. K. and Yoon, S. H. (2012). Prediction of Forest Fire Danger Rating over the Korean Peninsula with the Digital Forecast Data and Daily Weather Index (DWI) Model, Korean Journal of Agricultural and Forest Meteorology, 14(1), 1-10. https://doi.org/10.5532/KJAFM.2012.14.1.001.
  59. Won, M. S., Miah, D., Koo, K. S., Lee, M. B. and Shin, M. Y. (2010). Meteorological Determinants of Forest Fire Occurrence in the Fall, South Korea, Journal of Korean Society of Forest Science, 99(2), 163-171.
  60. Won, M. S., Yoon, S. H. and Jang, K. C. (2016). Developing Korean Forest Fire Occurrence Probability Model Reflecting Climate Change in the Spring of 2000s, Korean Journal of Agricultural and Forest Meteorology, 18(4), 199-207. https://doi.org/10.5532/KJAFM.2016.18.4.199.
  61. Ye, J., Wu, M., Deng, Z., Xu, S., Zhou, R. and Clarke, K. C. (2017). Modeling the Spatial Patterns of Human Wildfire Ignition in Yunnan Province, China, Applied Geography, 89, 150-162. https://doi.org/10.1016/j.apgeog.2017.09.012
  62. Yoo, B. J. (2021). A Study on the Performance Comparison and Approach Strategy by Classification Methods of Imbalanced Data, Journal of The Korean Data Analysis Society, 23(1), 195-207. https://doi.org/10.37727/jkdas.2021.23.1.195.