DOI QR코드

DOI QR Code

사례기반추론과 텍스트마이닝 기법을 활용한 KTX 차량고장 지능형 조치지원시스템 연구

An Intelligence Support System Research on KTX Rolling Stock Failure Using Case-based Reasoning and Text Mining

  • 이형일 (한양대학교 일반대학원 비즈니스인포매틱스학과) ;
  • 김종우 (한양대학교 경영대학 경영학부)
  • Lee, Hyung Il (Department of Business Informatics, Graduate School, Hanyang University) ;
  • Kim, Jong Woo (School of Business, Hanyang University)
  • 투고 : 2019.11.14
  • 심사 : 2020.03.14
  • 발행 : 2020.03.31

초록

KTX 차량은 수많은 기계, 전기 장치 및 부품들로 구성되어 있는 하나의 시스템으로 차량의 유지보수에는 상당히 많은 전문성과 유지보수 작업자들의 경험을 필요로 한다. 차량 고장발생 시 유지보수자의 지식과 경험에 따라 문제 해결의 시간과 작업의 질적 차이가 발생하며 그에 따른 차량의 가용율이 달라진다. 일반적으로 문제해결은 고장 매뉴얼을 기반으로 하지만 경험이 많고 능숙한 전문가의 경우는 이와 더불어 개인의 노하우를 접목하여 신속하게 진단하고 조치를 취한다. 이러한 지식은 암묵지 형태로 존재하기 때문에 후임자에게 완전히 전수되기 어려우며, 이를 위해 사례기반의 철도차량 전문가시스템을 개발하여 데이터화된 지식으로 바꾸려고 하는 연구들이 있어왔다. 하지만, 간선에 가장 많이 투입되고 있는 KTX 차량에 대한 연구나 텍스트의 특징을 추출하여 유사사례를 검색하는 시스템 개발은 아직 미비하다. 따라서, 본 연구에서는 이러한 차량 유지보수 전문가들의 노하우를 통해 수행된 고장들에 대한 진단과 조치 이력을 문제 해결의 사례로 활용하여 새롭게 발생하는 고장에 대한 조치가이드를 제공하는 지능형 조치지원시스템을 제안하고자 한다. 이를 위하여, 2015년부터 2017년동안 생성된 차량고장 데이터를 수집하여 사례베이스를 구축하였고, 차원축소 기법인 비음수 행렬 인수분해(NMF), 잠재의미분석(LSA), Doc2Vec을 통해 고장의 특징을 추출하여 벡터 간의 코사인 거리를 측정하는 방식으로 유사 사례를 검색하였으며, 위의 알고리즘에 의해 제안된 조치내역들 간 성능을 비교하였다. 분석결과, 고장 내역의 키워드가 적은 경우의 유사 사례 검색과 조치 제안은 코사인 유사도를 직접 적용하는 경우에도 좋은 성능을 낸다는 것을 알 수 있었고 차원 축소 기법들의 성능 비교를 통해 문맥적 의미를 보존하는 차원 축소 방식 중 Doc2Vec을 적용하는 것이 가장 좋은 성능을 나타낸다는 것을 알 수 있었다. 텍스트 마이닝 기술은 여러 분야에서 활용을 위한 연구들이 이루어지고 있는 추세이나, 본 연구에서 활용하고자 하는 분야처럼 전문적인 용어들이 다수이고 데이터에 대한 접근이 제한적인 환경에서 이러한 텍스트 데이터를 활용한 연구는 아직 부족한 실정이다. 본 연구는 이러한 관점에서 키워드 기반의 사례 검색을 보완하고자 텍스트 마이닝 기법을 접목하여 고장의 특징을 추출하는 방식으로 사례를 검색해 조치를 제안하는 지능형 진단시스템을 제시하였다는 데에 의의가 있다. 이를 통해 현장에서 바로 사용 가능한 진단시스템을 단계적으로 개발하는데 기초자료로써 시사점을 제공할 수 있을 것으로 기대한다.

KTX rolling stocks are a system consisting of several machines, electrical devices, and components. The maintenance of the rolling stocks requires considerable expertise and experience of maintenance workers. In the event of a rolling stock failure, the knowledge and experience of the maintainer will result in a difference in the quality of the time and work to solve the problem. So, the resulting availability of the vehicle will vary. Although problem solving is generally based on fault manuals, experienced and skilled professionals can quickly diagnose and take actions by applying personal know-how. Since this knowledge exists in a tacit form, it is difficult to pass it on completely to a successor, and there have been studies that have developed a case-based rolling stock expert system to turn it into a data-driven one. Nonetheless, research on the most commonly used KTX rolling stock on the main-line or the development of a system that extracts text meanings and searches for similar cases is still lacking. Therefore, this study proposes an intelligence supporting system that provides an action guide for emerging failures by using the know-how of these rolling stocks maintenance experts as an example of problem solving. For this purpose, the case base was constructed by collecting the rolling stocks failure data generated from 2015 to 2017, and the integrated dictionary was constructed separately through the case base to include the essential terminology and failure codes in consideration of the specialty of the railway rolling stock sector. Based on a deployed case base, a new failure was retrieved from past cases and the top three most similar failure cases were extracted to propose the actual actions of these cases as a diagnostic guide. In this study, various dimensionality reduction measures were applied to calculate similarity by taking into account the meaningful relationship of failure details in order to compensate for the limitations of the method of searching cases by keyword matching in rolling stock failure expert system studies using case-based reasoning in the precedent case-based expert system studies, and their usefulness was verified through experiments. Among the various dimensionality reduction techniques, similar cases were retrieved by applying three algorithms: Non-negative Matrix Factorization(NMF), Latent Semantic Analysis(LSA), and Doc2Vec to extract the characteristics of the failure and measure the cosine distance between the vectors. The precision, recall, and F-measure methods were used to assess the performance of the proposed actions. To compare the performance of dimensionality reduction techniques, the analysis of variance confirmed that the performance differences of the five algorithms were statistically significant, with a comparison between the algorithm that randomly extracts failure cases with identical failure codes and the algorithm that applies cosine similarity directly based on words. In addition, optimal techniques were derived for practical application by verifying differences in performance depending on the number of dimensions for dimensionality reduction. The analysis showed that the performance of the cosine similarity was higher than that of the dimension using Non-negative Matrix Factorization(NMF) and Latent Semantic Analysis(LSA) and the performance of algorithm using Doc2Vec was the highest. Furthermore, in terms of dimensionality reduction techniques, the larger the number of dimensions at the appropriate level, the better the performance was found. Through this study, we confirmed the usefulness of effective methods of extracting characteristics of data and converting unstructured data when applying case-based reasoning based on which most of the attributes are texted in the special field of KTX rolling stock. Text mining is a trend where studies are being conducted for use in many areas, but studies using such text data are still lacking in an environment where there are a number of specialized terms and limited access to data, such as the one we want to use in this study. In this regard, it is significant that the study first presented an intelligent diagnostic system that suggested action by searching for a case by applying text mining techniques to extract the characteristics of the failure to complement keyword-based case searches. It is expected that this will provide implications as basic study for developing diagnostic systems that can be used immediately on the site.

키워드

참고문헌

  1. Aamodt, A. and E. Plaza, "Case-based Reasoning: Foundational Issues, Methodological Variations, and System Approaches," AI communications, Vol.7, No.1(1994), 39-59. https://doi.org/10.3233/AIC-1994-7104
  2. Ahn, T. B. and J. T. Park, "Development of Model for Knowledge of Railway Facility Failure Cases," Journal of The Korean Society for Railway, Vol.22, No.2(2019), 169-177. https://doi.org/10.7782/JKSR.2019.22.2.169
  3. Ahn, T. K. and K. J. Park, "Case-Based Expert System for EMU," Proceedings of Conference of The Korean Institute of Electrical Engineers, (2006), 1085-1086.
  4. Choi, S. J. and M. H. Kim, "Case Study on the KTX High Speed Rolling Stock Maintenance Characteristic by Analyzing Failures Statistics for 10 Years," Proceedings of Conference of The Korean Society for Railway, (2014), 1297-1302.
  5. Eom, J. K., "The Text-mining using Railway Accident Data," Journal of The Korean Society for Urban Railway, Vol.7, No.3 (2019), 397-405. https://doi.org/10.24284/JKOSUR.2019.9.7.3.397
  6. Heo, G. E. and Y. G. Jung, "Efficient Text Documents Learning using Non-negative Matrix Factorization," Proceedings of Conference of The Korean Institute of Information Scientists and Engineers, Vol.36, No.2C (2009), 276-279.
  7. Jeon, S. M., H. W. Suh, and M. G. Jeong, "Automatic Failure Knowledge Extraction from Failure Analysis Documents," Proceedings of Conference of Society for Computational Design and Engineering, (2015), 12-22.
  8. Kim, B. J., S. Y. Lee, Y. D. Ahn, and S. J. Kang, "Wind Turbine Blade Fault Diagnosis System Using Machine Learning," Proceedings of Conference of The Korean Institute of Electrical Engineers, (2017), 1498-1499.
  9. Kim, D. S. and J. W. Kim, "Research Trend Analysis Using Bibliographic Information and Citations of Cloud Computing Articles: Application of Social Network Analysis," Journal of Intelligence and Information Systems, Vol.20, No.1(2014), 195-211. https://doi.org/10.13088/jiis.2014.20.1.195
  10. Le, Q. and T. Mikolov, "Distributed Representations of Sentences and Documents," Proceedings of International Conference on Machine Learning, (2014), 1188-1196.
  11. Lee, D. D. and H. S. Seung, "Algorithms for Non-negative Matrix Factorization," Advances in Neural Information Processing Systems, (2001), 556-562.
  12. Lee, G. J., B. Y. An, and M. H. Kim, "The Hybrid of Artificial Neural Networks and Case-based Reasoning for Diagnosis System," Proceedings of Conference of Korean Institute of Intelligent Systems, Vol.16, No.1 (2006), 130-133.
  13. Lee, J. S. and H. S. Myoung, "Development of a Book Recommender System for Internet Bookstore using Case-based Reasoning," Journal of Society for e-Business Studies, Vol.13, No.4(2008), 173-191.
  14. Lee, J. S. and Y. K. Kim, "A Hybrid Malfunction Diagnostic System Using Rules and Cases," Journal of Intelligence and Information Systems, Vol.4, No.1(1998), 115-131.
  15. Lee, W. Y., "Diagnostic Reasoning," Journal of Communications of the Korean Institute of Information Scientists and Engineers, Vol.10, No.4(1992), 50-55.
  16. Lee, W. Y., "A study on Fault Diagnosis Methodology [written in Korean]," Proceedings of Conference of Korean Institute of Industrial Engineers, (1998), 763-765.
  17. Linoff, G. S. and M. J. Berry, Data Mining Techniques: for Marketing, Sales, and Customer Relationship Management. Third Edition, John Wiley & Sons, New Jersey, 2011.
  18. Mikolov, T., K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space", arXiv preprint arXiv:1301.3781, (2013).
  19. Park, K. H., K. S. Kim, and J. W. Lee, "Efficient Expert System Establish using Text Data of Crop Disease based on Cosine Similarity," Proceedings of Conference of The Korean Institute of Communications and Information Sciences, (2018), 312-313.
  20. Park, K. J., "The Development of Case-Based Fault Diagnosis Expert System of Urban Transit Vehicles," Proceedings of Conference of Korean Society for Precision Engineering, (2012), 1249-1250.
  21. Park, S. K. and M. C. Shin, "Implementation of Korean Sentence Similarity using Sent2Vec Sentence Embedding," Proceedings of Conference of Human and Language Technology, (2018), 541-545.
  22. Park, Y. K., S. B. Park, N. I. Park, and H. A. Lee, "Web News Classification Using Latent Semantic Analysis," Proceedings of Conference of The Korean Institute of Information Scientists and Engineers, (2017), 1828-1830.
  23. Song, G. J. and J. J. Lim, "A Study on the Diagnosis and Prediction System of Vehicle Faults Using Condition Based Maintenance Technique," Journal of The Korea Institute of Intelligent Transport System, Vol.18, No.4 (2019), 80-95. https://doi.org/10.12815/kits.2019.18.4.80
  24. Wang, A., "An Industrial Strength Audio Search Algorithm," Ismir, Vol.2003, (2003), 7-13.
  25. Yoon, M. H., J. H. Kim, and H. Jin, "Prediction for Performance of KNN in Diagnosis considering Features of Coronary Artery Disease Dataset," Proceedings of Conference of The Institute of Electronics and Information Engineers, (2013), 834-838.