DOI QR코드

DOI QR Code

A Study on the Prediction Models of Used Car Prices Using Ensemble Model And SHAP Value: Focus on Feature of the Vehicle Type

앙상블 모델과 SHAP Value를 활용한 국내 중고차 가격 예측 모델에 관한 연구: 차종 특성을 중심으로

  • Seungjun Yim (Department of Business Administration, Hongik University) ;
  • Joungho Lee (School of Business, Konkuk University) ;
  • Choonho Ryu (Department of Business Administration, Hongik University)
  • Received : 2024.01.09
  • Accepted : 2024.03.07
  • Published : 2024.03.31

Abstract

The market share of online platform services in the used car market continues to expand. And The used car online platform service provides service users with specifications of vehicles, accident history, inspection details, detailed options, and prices of used cars. SUV vehicle type's share in the domestic automobile market will be more than 50% in 2023, Sales of Hybrid vehicle type are doubled compared to last year. And these vehicle types are also gaining popularity in the used car market. Prior research has proposed a used car price prediction model by executing a Machine Learning model for all vehicles or vehicles by brand. On the other hand, the popularity of SUV and Hybrid vehicles in the domestic market continues to rise, but It was difficult to find a study that proposed a used car price prediction model for these vehicle type. This study selects a used car price prediction model by vehicle type using vehicle specifications and options for Sedans, SUV, and Hybrid vehicles produced by domestic brands. Accordingly, after selecting feature through the Lasso regression model, which is a feature selection, the ensemble model was sequentially executed with the same sampling, and the best model by vehicle type was selected. As a result, the best model for all models was selected as the CBR model, and the contribution and direction of the features were confirmed by visualizing Tree SHAP Value for the best model for each model. The implications of this study are expected to propose a used car price prediction model by vehicle type to sales officials using online platform services, confirm the attribution and direction of features, and help solve problems caused by asymmetry fo information between them.

중고차 시장에서 온라인 플랫폼 서비스의 시장 점유율은 지속적으로 증가하고 있다. 또한 중고차 온라인 플랫폼 서비스는 서비스 이용자에게 차량의 제원, 사고 이력, 점검 내역, 세부 옵션, 그리고 중고차의 가격 등을 공개하고 있다. 2023년 현재 국내 자동차 시장에서 SUV 차종의 신차 점유율은 50% 이상으로 확대되었으며, 하이브리드 차종은 신차 판매량이 지난해에 비해 두 배 이상 증가하였다. 이에 따라 이들 차종은 국내 중고차 시장에서도 인기를 끌고 있다. 기존 연구는 전체 차량 또는 브랜드별 차량을 대상으로 머신러닝 모델을 실행하여 중고차 가격 예측 모델을 제안하였다. 반면 국내 자동차 시장에서 SUV와 하이브리드 차종의 인기는 매년 상승하고 있으나, 이들 차종을 대상으로 중고차 가격 예측 모델을 제안한 연구는 찾기 어려웠다. 본 연구는 국내 시장에서 자국 브랜드가 생산한 세단, SUV, 그리고 하이브리드 차종을 대상으로 차량 제원과 옵션, 총 72개의 특성을 활용하여 이들 차종별 가장 우수한 중고차 가격 예측 모델을 선정하였다. 이를 위해 특성 선택으로 Lasso 회귀 모델을 활용하여 특성을 선별한 후 동일 샘플링으로 앙상블 모델을 실행하였다. 그 결과 모든 차종에서 최우수 모델은 CBR 모델로 선정되었으며, 차종별 최우수 모델을 대상으로 Tree SHAP Value의 시각화를 실행하여 특성의 기여도 및 방향성을 확인하였다. 본 연구의 시사점으로 온라인 플랫폼 서비스를 이용하는 매매관계자에게 차종별 중고차 가격 예측 모델을 제안하고 특성의 기여 수준과 방향성을 확인함으로써 이들 간 정보의 비대칭으로 야기된 문제 해결에 지원이 될 것으로 기대한다.

Keywords

References

  1. Breiman, L.(2001), Random Forests, Machine Learning, 45, 5-32.
  2. Chaudhary, L., Sharma, S., and Sajwan, M.(2022), Comparative Analysis of Supervised Machine Learning Algorithm, Available at SSRN 4143890.
  3. Chelgani, S. C., Nasiri, H., Tohry, A., and Heidari, H. R.(2023), Modeling industrial hydrocyclone operational variables by SHAP-CatBoost-A "conscious lab" approach, Powder Technology, 420, 118416.
  4. Chen, T. and Guestrin, C.(2016), XGBoost: A scalable tree boosting system, In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 785-794.
  5. Das Adhikary, D. R., Sahu, R., and Pragyna Panda, S.(2022), Prediction of used car prices using machine learning, In Biologically Inspired Techniques in Many Criteria Decision Making: Proceedings of BITMDM 2022, 131-140.
  6. Gegic, E., Isakovic, B., Keco, D., Masetic, Z., and Kevric, J.(2019), Car price prediction using machine learning techniques, TEM Journal, 8(1), 113.
  7. Geron, A.(2022), Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow, O'Reilly Media, Inc.
  8. Han, J.H.(2022. 12. 7.) Imported cars have a nearly 20% market share this year, the largest "acceleration", Dong-A ILBO from https://www.donga.com/news/Economy/article/all/20221206/116881585/1
  9. Huang, J., Yu, Z., Ning, Z., and Hu, D.(2022), Used Car Price Prediction Analysis Based on Machine Learning, In 2022 International Conference on Artificial Intelligence, Internet and Digital Economy, 2022 International Conference on Artificial Intelligence, Internet and Digital Economy, 356-364.
  10. Kim, H.W.(2023) SUV Preferred Consumers..."The interior space is spacious, the body is high, and the ride feels good.", UPI NEWS. from https://upinews.kr/newsView/upi202303200062
  11. Lee, C. H.(2022), Opportunities in the used car industry, Samsung Securities, 1-31
  12. Lee, D. K. and Shin, M. S.(2023), Prediction of Dormant Customer in the Card Industry, Journal of Service Research and Studies, 13(2), 99-113
  13. Lee, J. I.(2022), Discover the potential of the domestic used car market, Eugene Investment & Securities Automotive Issues, 1-36
  14. Lee, S. H.(2022), 2022 Copyright industry issues in the next-generation digital enviroment, Trade and Industry Statistics Team KOREA COPYRIGHT COMMISSION, 14-37
  15. Lim, K. C.(2023) The endless strides of Hybrids...Number one increase in share by fuel, YONHAP NEWS. from https://www.yna.co.kr/view/AKR20230527040100003
  16. Lundberg, S. M., Erion, G. G., and Lee, S. I.(2018), Consistent individualized feature attribution for tree ensembles, arXiv preprint, arXiv:1802.03888.
  17. Nasiboglu, R., and Akdogan, A.(2020), Estimation of the second hand car prices from data eXtreamcted via web scraping techniques, Journal of Modern Technology and Engineering, 5(2), 157-166.
  18. Nargesian, F., Samulowitz, H., Khurana, U., Khalil, E.B., and Turaga, D.S.(2017), Learning Feature Engineering for Classification, In Ijcai, 17, 2529-2535.
  19. Ramampiandra, E. C., Scheidegger, A., Wydler, J., and Schuwirth, N.(2023), A comparison of machine learning and statistical species distribution models: Quantifying overfitting supports model interpretation, Ecological Modelling, 481, 110353.
  20. Samuel, A. L.(1959), Machine learning, The Technology Review, 62(1), 42-45.
  21. Shanti, N., Assi, A., Shakhshir, H., and Salman, A.(2021), Machine Learning Powered Mobile App for Predicting Used Car Prices, In 2021 3rd International. Conference on Big-data Service and Intelligent Computation, Xiamen, China, November 19-21, 2021, 52-60.
  22. Sikora, R.(2015), A modified stacking ensemble machine learning algorithm using genetic algorithms, In Handbook of research on organizational transformations through big data analytics, 43-53.
  23. Staartjes, V. E., Regli, L., and Serra, C. (Eds.). (2022), Machine learning in clinical neuroscience: Foundations and application, Springer.
  24. Seo, H. J., Kim, D. H., and Byun, J. H.(2023), Data Pre-processing for Manufacturing Quality Improvement, Journal of the Korean Institute of Industrial Engineers, 49(3), 248-257. https://doi.org/10.7232/JKIIE.2023.49.3.248
  25. Wang, A., Yu, Q., Li, X., Lu, Z., Yu, X., and Wang, Z.(2022), Research on Used Car Valuation Problem Based on Machine Learning, In 2022 International Conference on Computer Network, Electronic and Automation (ICCNEA), 101-106.
  26. Yim, S. J., Lee, J. H., and Ryu, C. H.(2023), A Study on the Prediction Models of Used Car Prices for Domestic Brands Using Machine Learning, Journal of Service Research and Studies, 13(3), 106-127.