DOI QR코드

DOI QR Code

A Study on Injury Severity Prediction for Car-to-Car Traffic Accidents

차대차 교통사고에 대한 상해 심각도 예측 연구

  • 고창완 (전남대학교 산업공학과) ;
  • 김현민 (전남대학교 산업공학과) ;
  • 정영선 (전남대학교 산업공학과) ;
  • 김재희 (전북대학교 경영학과)
  • Received : 2020.04.21
  • Accepted : 2020.07.27
  • Published : 2020.08.31

Abstract

Automobiles have long been an essential part of daily life, but the social costs of car traffic accidents exceed 9% of the national budget of Korea. Hence, it is necessary to establish prevention and response system for car traffic accidents. In order to present a model that can classify and predict the degree of injury in car traffic accidents, we used big data analysis techniques of K-nearest neighbor, logistic regression analysis, naive bayes classifier, decision tree, and ensemble algorithm. The performances of the models were analyzed by using the data on the nationwide traffic accidents over the past three years. In particular, considering the difference in the number of data among the respective injury severity levels, we used down-sampling methods for the group with a large number of samples to enhance the accuracy of the classification of the models and then verified the statistical significance of the models using ANOVA.

자동차는 우리의 일상에 필수재가 된 지 오래지만 자동차 교통사고로 인한 사회적 비용이 국가 예산의 9%를 넘을 정도로 심각하여 이에 대한 국가적인 예방 및 대응 체계 구축이 매우 필요한 실정이다. 이에 본 연구에서는 빅데이터 분석 기법을 활용하여 차대차 교통사고의 상해 심각도를 정확히 예측할 수 있는 모형을 제시하고자 하였다. 이를 위해 과거 3년간의 전국교통사고 발생 데이터를 토대로, K-최근접 이웃, 로지스틱 회귀분석, 나이브베이즈, 의사결정나무, 앙상블 알고리즘을 적용하여 각 모델의 상해 심각도 분류의 성능을 비교 분석하였다. 특히 이 과정에서 각 상해 심각도 수준 간의 데이터 수에 차이가 있음에 주목하여 표본수가 많은 그룹에 대해서는 과소표본추출을 시행하는 등의 방법을 통해 분류 예측의 정확도를 높일 수 있었고, 분산 분석을 통해 모델의 유의성을 검증하였다.

Keywords

References

  1. Breiman L.(2001), "Random forest," Machine Learning, vol. 45, pp.5-32. https://doi.org/10.1023/A:1010933404324
  2. Breiman L., Friedman J. H., Olshen R. A. and Stone C. G.(1984), Classification and Regression Trees, Chapman & Hall, pp.3-4.
  3. Cover T. M. and Hart P.(1967), "The nearest neighbor decision rule," IEEE Transactions on Information Theory, vol. 13, no. 1, pp.21-27. https://doi.org/10.1109/TIT.1967.1053964
  4. Dietterich T. G.(1997), "Machine learning research: four current directions," AI Magazine, vol. 18, no. 4, pp.97-136.
  5. Gentle J. E. and Hadle W.(2012), Handbook of Computational Statistics: Concepts and Methods, pp.985-1022.
  6. Hahn D. W., Park K. S. and Shin Y. K.(2002), "A Research on Regional Differences in Traffic environments and Driver's Behaviors in Korea," The Korean Journal of Psychological Association, vol. 8, no. 1, pp.17-40.
  7. Hastie T., Tibshirani R. and Friedman J.(2009), The Elements of Statistical Learning, Springer, pp.307-310.
  8. Hong S. E., Lee G. Y. and Kim H. J.(2015), "A Study on Traffic Accident Injury severity Prediction Model Based on Public Data," Journal of Advanced Information Technology and Convergence, vol. 13, no. 5, pp.109-118.
  9. Isaac J. and Harikumar S.(2016), "Logistic regression within DBMS," 2nd International Conference on Contemporary Computing and Informatics (IC3I), pp.661-666.
  10. Jeong H. J., Jang Y. C., Bowman P. J. and Masoud N.(2018), "Classification or motor vehicle crash injury severity: A hybrid approach for imbalanced data," Accident Analysis and Prevention, vol. 120, pp.250-261. https://doi.org/10.1016/j.aap.2018.08.025
  11. Jeong H. R., Kim H. H., Park S. M., Han E., Kim K. H. and Yun I. S.(2017), "Prediction of Severities of Rental Car Traffic Accidents using Naive Bayes Big Data Classifier," The Journal of The Korea Institute of Intelligent Transport System, vol. 16, no. 4, pp.1-12.
  12. Jung Y. H., Eo S. H., Moon H. S. and Cho H. J.(2010), "A Study for Improving the Performance of Data Mining Using Ensemble Techniques," Communications for Statistical Applications and Methods, vol. 17, no. 4, pp.561-574. https://doi.org/10.5351/CKSS.2010.17.4.561
  13. Kang P. and Cho S.(2006), "EUS SVMs: Ensemble of Under sampled SVMs for Data Imbalance Problems," Lecture Notes in Computer Science, vol. 4232, pp.837-846.
  14. Kass G.(1980), "An exploratory technique for investigating large quantities of categorical data," Applied Statistics, vol. 29 no. 2, pp.119-127. https://doi.org/10.2307/2986296
  15. Korea Road and Traffic Authority(2014), Estimation of Traffic Accident Costs by region.
  16. Korea Road and Traffic Authority(2018), Estimation and Evaluation of Traffic Accident Costs.
  17. Korea Road and Traffic Authority(2019), Comparison of Traffic Accident of OECD Members States.
  18. Lee J. S. and Heo G.(2011), "Injury Severity Prediction of Traffic Accident using Data Mining," Proceedings of the 2011 Fall Conference of Korean Intelligent Information Systems Society, pp.199-206.
  19. Lee J. S. and Lee E. J.(2009), "Analysis of Traffic Accidents using Decision Tree Ensemble Model," Proceedings of the 2009 Fall Conference of Korean Intelligent Information Systems Society, pp.211-218.
  20. Lee J. Y. and Lee Y. J.(2018), "Exploration of the Factors Determining the Lecture Education of Liberal Arts Courses Utilizing the Decision Tree Analysis," Korean Journal of General Education, vol. 12, no. 6, pp.67-93.
  21. Quinlan J. R.(1993), C4.5 : Programs for machine learning, Morgan Kaufmann, San Mateo.
  22. Sohn S. Y. and Shin H. W.(1998), "Data Mining for Road Traffic Accident Type Classification," Journal of the Korean Institute of Industrial Engineers, pp.542-549.
  23. Uddin M. and Huynh N.(2020), "Injury severity analysis of truck-involved crashes under different weather conditions," Accident Analysis and Prevention, vol. 141.
  24. Yoo J. E.(2015), "Random forests, an alternative data mining technique to decision tree," Journal of Educational Evaluation, vol. 28, no. 2, pp.427-448.