DOI QR코드

DOI QR Code

Classifying the severity of pedestrian accidents using ensemble machine learning algorithms: A case study of Daejeon City

앙상블 학습기법을 활용한 보행자 교통사고 심각도 분류: 대전시 사례를 중심으로

  • Kang, Heungsik (Department of Mechatronics Engineering, Chungnam National University) ;
  • Noh, Myounggyu (Department of Mechatronics Engineering, Chungnam National University)
  • 강흥식 (충남대학교 메카트로닉스공학과) ;
  • 노명규 (충남대학교 메카트로닉스공학과)
  • Received : 2022.03.25
  • Accepted : 2022.05.20
  • Published : 2022.05.28

Abstract

As the link between traffic accidents and social and economic losses has been confirmed, there is a growing interest in developing safety policies based on crash data and a need for countermeasures to reduce severe crash outcomes such as severe injuries and fatalities. In this study, we select Daejeon city where the relative proportion of fatal crashes is high, as a case study region and focus on the severity of pedestrian crashes. After a series of data manipulation process, we run machine learning algorithms for the optimal model selection and variable identification. Of nine algorithms applied, AdaBoost and Random Forest (ensemble based ones) outperform others in terms of performance metrics. Based on the results, we identify major influential factors (i.e., the age of pedestrian as 70s or 20s, pedestrian crossing) on pedestrian crashes in Daejeon, and suggest them as measures for reducing severe outcomes.

교통사고와 사회·경제적 손실 간의 연계성이 확인됨에 따라 사고 데이터에 기반을 둔 안전 정책 마련 및 중상·사망 등 그 심각도가 높은 교통사고의 절감 방안의 필요성이 제기되고 있다. 본 연구에서는 인구 대비 교통사고 사망자 비율이 높은 대전시를 대상지역으로 설정하고 보행자 교통사고 데이터를 수집한 후, 기계학습을 통해 최적알고리즘과 심각도 분류의 주요 인자를 도출하였다. 연구의 결과에 따르면, 적용한 9개 알고리즘 중 앙상블 기반의 학습 기법인 AdaBoost (Adaptive Boosting)와 RF (Random Forest)가 최적의 성능을 보여주었다. 이를 기반으로 도출된 대전시 보행자 교통사고 심각도의 주요 인자는 보행자의 연령이 70대 및 20대이거나 사고유형이 횡단사고에 의한 경우로 나타남에 따라 대전시 보행자 사고 저감 대책을 위한 고려요인으로 제안하였다.

Keywords

References

  1. ITF Author. (2020). Road Safety Annual Report. International Transport Forum. ISSN: 23124571 (online) DOI : 10.1787/23124571
  2. KoROAD. (2020). Comparison of traffic accidents in OECD member countries in 2018. Traffic Accidents Statistical Report. http://taas.koroad.or.kr
  3. B. G. Lee. (2020). Characteristics of Pedestrian Traffic Accidents and Reduction Plans. Daejeon Sejong Institute Basic Research Report. https://www.dsi.re.kr
  4. H. J. Jeon. (2020). Half of the fatalities in road accidents. Daejeon City invested KRW 100 billion. http://www..kmib.co.kr
  5. P. NILSSON & S. NILSSON. (2015). Application of Poisson Regression on Traffic Safety. KTH Royal Institute of Technology. www.kth.se/sci
  6. J. B. Lim, Y. H. Won, S. B. Lee & S. W. Kim. (2012). Bayesian analysis for the bivariate Poisson regression model: Applications to road safety countermeasures. Journal of the Korean Data & Information Science Society, 23(4), 851-858. DOI:10.7465/jkdi.2012.23.4.851
  7. J. P. Jeong & J. H. Choi. (2014). Poisson Regression and Negative Binomial Regression Model Fit for Traffic Accidents. Journal of the Korean Data Analysis Society, 16(1), 165-172
  8. Y. D. Kim & K. H. Cho. (2013). Big data and statistics. Journal of the Korean Data And Imformation Science Society, 24(5), 959-974 https://doi.org/10.7465/jkdi.2013.24.5.959
  9. S. E. Lee & H. J. Kim. (2020). A New Ensemble Machine Learning Technique with Multiple Stacking. The Journal of Society for e-Business Studies, 25(3), 1-13. DOI : 10.7838/Jsebs.2020.25.3.001
  10. S. H. Kim, Y. B. Lym & K. J. Kim.. (2021). Classifying Severity of Senior Driver Accidents In Capital Regions Based on Machine Learning Algorithms. Journal of Digital Convergence, 19(4), 25-31. DOI: 10.14400/JDC.2021.19.4.025
  11. Hints & Kinks. (2012). Classification and regression trees. International Journal of Public Health, 57, 243-246. https://doi.org/10.1007/s00038-011-0315-z
  12. L. Breiman, J. H. Friedman, R. A. Olshen, C.J. stone. (2017). Classification And Regression Trees. DOI:10.1201/9781315139470. Subjects Mathematics & Statistics. Pub. Location New York
  13. Z. Liu, H. Bensmail & M.Tan. (2012). Efficient Feature Selection and Multiclass Classification with Integrated Instance and Model Based Learning. Evol Bioinform Online. 8, 97-205. DOI:10.4137/EBO.S9407
  14. M. Biehl1, B. Hammer & T. Villmann.(2013). Distance measures for prototype based classification. International Workshop on Brain-Inspired Computing. 100-116. DOI:10.1007/978-3-319-12084-3_9
  15. C. W. Ko, H.M. Kim, Y.S. Jeong & J.H. Kim. (2020). A Study on Injury Severity Prediction for Car-to-Car Traffic Accidents. J. Korea Inst. Intell. Transp. Syst. Vol.19 No.4 pp.13~29. DOI : 10.12815/kits.
  16. M. Kuhn & K. Johnson. (2013). Applied predictive Modeling. Springer New York Heidelberg Dordrecht London. DOI: 10.1007/978-1-4614-6849-3
  17. S. R. Gunn.(1998). Support Vector Machines for Classification and Regression. Technical Report. UNIVERSITY OF SOUTHAMPTON
  18. X. Gu, T. Li, Y. Wang, Y., Zhang, L., Wang, Y., & Yao, J. (2018). Traffic fatalities prediction using support vector machine with hybrid particle swarm optimization. Journal of Algorithms and Computational Technology, 12(1), 20-29. DOI : 10.1177/1748301817729953
  19. N. Cristianini & J. Shawe-Taylor. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge: Cambridge University Press. DOI: 10.1017/CBO9780511801389
  20. G. Brown. (2010). Ensemble Learning. Encyclopedia of Machine Learning, 312, 15-19.
  21. Z. H. Zhou. (2012).Ensemble methods: Foundations and algorithms. Chapman and Hall/CRC, ISBN 978-1-439-830031
  22. Y. J. Kim. Y. L. Choi, S. L. Kim, K. Y. Park & J. H. Park. (2016). A study on method for user gender prediction using multi-modal smart device log data. The Journal of Society for e-Business Studies, 21(1), 147-163, DOI: 10.7838/ jsebs.2016.21.1.147
  23. L. Breiman. (1996). Bagging predictors. Machine Learning, 24(2), 123-140. https://doi.org/10.1007/BF00058655
  24. I. Syarif, E. Zaluska, A. Prugel-Bennett and G. Wills. (2012). Application of bagging, boosting and stacking to intrusion detection. International Workshop on Machine Learning and Data Mining in Pattern Recognition, 7376(8), 593-602, DOI: 10.1007/9783642315374
  25. P. BartlettR, Y. Freund, W. S. Lee, R. Schapire. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. The annals of statistics, 26(5), 1651-1686, DOI: 10.1214/aos/1024691352
  26. C. W. Kwon & H. H. Chang. (2021). Comparative Analysis of Traffic Accident Severity of Two-Wheeled Vehicles Using XGBoost, J. Korea Inst. Intell Transp Syst, 20(4), 1-12. DOI:10.12815/kits.2021.20.4.1
  27. J. Tang, J. Liang, C. Han, Z. Li, H. Huang. (2019). Crash injury severity analysis using a two-layer Stacking framework. Accident Analysis & Prevention, 122, 226-238. DOI: 10.1016/j.aap.2018.10.016
  28. X. Wen, Y. Xie, L. Jiang, Z. Pu &T. Ge. (2021). Applications of machine learning methods in traffic crash severity modelling: current status and future directions. Transport Reviews. 41(6), 855-879. DOI : 10.1080/01441647.2021.1954108
  29. D. Altman, J. Bland. (1994). Diagnostic Tests 3: Receiver Operating Characteristic Plots. British Medical Journal, 309(6948), 188. DOI: 10.1136/bmj.309.6948.188
  30. C. D. Brown & H. T. Davis. (2006). Receiver Operating Characteristics Curves and Related Decision Measures: A Tutorial. Chemometrics and Intelligent Laboratory Systems, 80(1), 24-38. DOI: 10.1016/j.chemolab.2005.05.004
  31. T. Fawcett. (2006). An Introduction to ROC Analysis. Pattern Recognition Letters, 27(8), 861-874. DOI: 10.1016/j.patrec.2005.10.010