Detection and Prediction of Subway Failure using Machine Learning

머신러닝을 이용한 지하철 고장 탐지 및 예측

  • Kuk-Kyung Sung (Department of Broadcasting and Emerging Media, Korea University of Media Arts)
  • 성국경 (한국영상대학교 방송영상미디어과)
  • Received : 2023.11.14
  • Accepted : 2023.12.20
  • Published : 2023.12.30


The subway is a means of public transportation that plays an important role in the transportation system of modern cities. However, congestion often occurs due to sudden breakdowns and system outages, causing inconvenience. Therefore, in this paper, we conducted a study on failure prediction and prevention using machine learning to efficiently operate the subway system. Using UC Irvine's MetroPT-3 dataset, we built a subway breakdown prediction model using logistic regression. The model predicted the non-failure state with a high accuracy of 0.991. However, precision and recall are relatively low, suggesting the possibility of error in failure prediction. The ROC_AUC value is 0.901, indicating that the model can classify better than random guessing. The constructed model is useful for stable operation of the subway system, but additional research is needed to improve performance. Therefore, in the future, if there is a lot of learning data and the data is well purified, failure can be prevented by pre-inspection through prediction.

지하철은 현대 도시의 교통 체계에서 중요한 역할을 하는 대중 교통 수단이다. 하지만, 갑작스런 고장 및 시스템 불통 등의 이유로 혼잡을 야기시키는 경우가 종종 발생하여 불편을 초래하고 있다. 따라서, 본 논문에서는 지하철 시스템의 효율적 운영을 위해 머신러닝을 활용한 고장 예측 및 예방 연구를 진행하였다. UC Irvine의 MetroPT-3 데이터셋을 활용하고, 로지스틱 회귀를 이용하여 지하철 고장 예측 모델을 구축하였다. 모델은 0.991의 높은 정확도로 비고장 상태를 예측하나, 정밀도와 재현율은 상대적으로 낮아 고장 예측에 있어 오류 가능성을 시사하고 있다. ROC_AUC 값이 0.901로, 모델이 무작위 추측보다 뛰어난 분류를 할 수 있다. 구축한 모델은 지하철 시스템의 안정적인 운영 운영에 유용하나, 성능 개선을 위한 추가 연구가 필요하다고 생각한다. 따라서 학습 데이터가 많고 데이터의 정제가 잘 이루어진다면 고장 예측을 통해 사전 점검을 하여 예방할 수 있다.



  1. Lee, S. Y, Seo, B. W., & Park, S. M. (2023). Conv-LSTM-based Range Modeling and Traffic Congestion Prediction Algorithm for the Efficient Transportation System. The Journal of the Korea institute of electronic communication sciences, 18(2), 321-327. DOI : 10.13067/JKIECS.2023.18.2.321
  2. Kim, J. Y, Lim, S. Y, Choo, S. H., & Park, I. K. (2015). Analysis of Transit Ridership Patterns and Influencing Factors in Seoul. The Korea Spatial Planning Review. 49-65. DOI : 10.15793/KSPR.2015.87..004
  3. Kim, J. I. (2013). The Determinants of Subway Riderships at AM-peak in Daegu Metropolitan City: Focusing on the Land Use of Station Neighborhood Areas. Journal of Transport Research, 20(1), 15-25. DOI : 10.34143/JTR.2013.20.1.15
  4. Ki, T. S., & Lee, S. H. (2017). A Prediction Scheme for Power Apparatus using Artificial Neural Networks. Journal of Convergence for Information Technology, 7(6), 201-207. DOI : 10.22156/CS4SMB.2017.7.6.201
  5. Lee, H. W. (2011). Development of Supervised Machine Learning based Catalog Entry Classification and Recommendation System. Journal of Internet Computing and Services, 20(1), 57-65. DOI : 10.7472/JKSII.2019.20.1.57
  6. Park, Y. K., & Youn, J. H. (2021). A Study on Detection of a Keyboard Trigger Based on Machine Learning. Proceedings of the Korea Information Processing Society Conference, 179-180. DOI : 10.3745/PKIPS.Y2021M05A.179
  7. Paik, G. O., Kang, M. C., Soul, M. W. & Lim, S. J. (2020). ARIMA, Machine Learning Approach to Forecasting Empty Container Volumes. Proceedings of the Korea Information Processing Society Conference, 953-955. DOI : 10.3745/PKIPS.Y2020M11A.953
  8. Kim, M. Y. (2017). Analysis for Factors of Predicting Problem Drinking by Logistic Regression Analysis. Journal of Digital Convergence, 15(5), 487-494. DOI : 10.14400/JDC.2017.15.5.487
  9. Vasanth, R. et al. (2018). Identification of Environmental Factors in Fruit Disease by Logistic Regression. Journal of Knowledge Information Technology and Systems, 13(5), 521-532. DOI : 10.34163/JKITS.2018.13.5.002
  10. Baek, S. A. et al. (2016). Assessment of Slope Failures Potential in Forest Roads using a Logistic Regression Model. Journal of Korean Forest Society, 105(4), 429-434. DOI : 10.14578/JKFS.2016.105.4.429
  11. Chun, J. A, Lee, H. J, Im, S. H, Kim, D. H. & Baek, S. S. (2021). Comparative assessment of frost event prediction models using logistic regression, random forest, and LSTM networks. Korea Water Resources Association, 54(9), 667-680. DOI : 10.3741/JKWRA.2021.54.9.667
  12. Song, J. H, Shin, J. W. & Han, H. S. (2023). Multimetric Measurement Data Monitoring System Using Sigmoid Function. The Journal of Engineering Geology, 33(1), 137-149. DOI : 10.9720/KSEG.2023.1.137
  13. Park, I. S. & Han, J. T. (2019). Study of effect on the obesity status using multilevel logistic regression analysis. Journal of the Korean Data And Information Science Sociaty, 30(1). 205-217. DOI : 10.7465/jkdi.2019.30.1.205
  14. Kim, C. J., Jeong, J. H., Jo, C. W., & Yoo, J. K. (2019). A performance evaluation analysis of product recommendation techniques. Journal of Knowledge Information Technology and System, 14(5), 515-525. DOI : 10.34163/jkits.2019.14.5.008
  15. Lee, S. M. & Kwon, H. Y. (n. d.). A Performance Evaluation of Deep Learning Methods for Anomaly Detection and Distributed Learning Model. Korean Institute of Information Scientists and Engineers, 48(1), 864-866.