DOI QR코드

DOI QR Code

Linear interpolation and Machine Learning Methods for Gas Leakage Prediction Base on Multi-source Data Integration

다중소스 데이터 융합 기반의 가스 누출 예측을 위한 선형 보간 및 머신러닝 기법

  • Dashdondov, Khongorzul (Department of Computer Engineering, Chungbuk National University) ;
  • Jo, Kyuri (Department of Computer Engineering, Chungbuk National University) ;
  • Kim, Mi-Hye (Department of Computer Engineering, Chungbuk National University)
  • Received : 2021.12.28
  • Accepted : 2022.03.20
  • Published : 2022.03.28

Abstract

In this article, we proposed to predict natural gas (NG) leakage levels through feature selection based on a factor analysis (FA) of the integrating the Korean Meteorological Agency data and natural gas leakage data for considering complex factors. The paper has been divided into three modules. First, we filled missing data based on the linear interpolation method on the integrated data set, and selected essential features using FA with OrdinalEncoder (OE)-based normalization. The dataset is labeled by K-means clustering. The final module uses four algorithms, K-nearest neighbors (KNN), decision tree (DT), random forest (RF), Naive Bayes (NB), to predict gas leakage levels. The proposed method is evaluated by the accuracy, area under the ROC curve (AUC), and mean standard error (MSE). The test results indicate that the OrdinalEncoder-Factor analysis (OE-F)-based classification method has improved successfully. Moreover, OE-F-based KNN (OE-F-KNN) showed the best performance by giving 95.20% accuracy, an AUC of 96.13%, and an MSE of 0.031.

본 논문에서는 다중 요인을 고려한 천연 가스 누출 정도 예측을 위해 관련 요인을 포함하는 기상청 자료와 천연가스 누출 자료를 통합하고, 요인 분석을 기반으로 중요 특성을 선택하는 머신러닝 기법을 제안한다. 제안된 기법은 3단계 절차로 구성되어 있다. 먼저, 통합 데이터 셋에 대해 선형 보간법을 수행하여 결측 데이터를 보완하는 전처리를 수행한다. 머신러닝 모델 학습 최적화를 위해 OrdinalEncoder(OE) 기반 정규화와 함께 요인 분석을 사용하여 필수 특징을 선택하며, 데이터 셋은 k-평균 클러스터링으로 레이블을 지정한다. 최종적으로 K-최근접 이웃, DT(Decision Tree), RF(Random Forest), NB(Naive Bayes)의 네 가지 알고리즘을 사용하여 가스 누출 수준을 예측한다. 제안된 방법은 정확도, AUC, 평균 표준 오차(MSE)로 평가되었으며, 테스트 결과 OE-F 전처리를 수행한 경우 기존 기법에 비해 성공적으로 개선되었음을 보였다. 또한 OE-F 기반 KNN(OE-F-KNN)은 95.20%의 정확도, 96.13%의 AUC, 0.031의 MSE로 비교 알고리즘 중 최고 성능을 보였다.

Keywords

Acknowledgement

This research was financially supported by the Ministry of Trade, Industry, and Energy (MOTIE) of Korea under the "Regional Specialized Industry Development Program" (R&D, P0002072) supervised by the Korea Institute for Advancement of Technology (KIAT).

References

  1. Ministry of Public Safety and Security. (2019) 2019th Yearbook of Disaster, Ministry of Public Safety and Security; Ministry of Public Safety and Security: Sejong, Korea.
  2. D. Khongorzul, M. H. Kim & S. M. Lee. (2019). OrdinalEncoder based DNN for Natural Gas Leak Prediction. J. Korea Convergence Society, 10(10), 7-13. https://doi.org/10.15207/jkcs.2019.10.10.007
  3. Available website: UPO company, http://www.upokorea.com/new/pdf/UPO_Catalogue.pdf
  4. D. Khongorzul & M. H. Song. (2022). Factorial Analysis for Gas Leakage Risk Predictions from a Vehicle-Based Methane Survey. Applied Sciences 12(1), 115. DOI : 10.3390/app12010115
  5. Department for International Development. Live Data Page for Energy and Water Consumption. Available online: http://data.gov.uk/dataset/dfid-energy-and-water-consumption (accessed on 8 March 2021).
  6. USDT. Leak Detection Technology Study for PIPES Act; Tech. Rep.; U.S. Department of Transportation: Washington, DC, USA, 2007.
  7. M. Fagiani, S. Squartini, L. Gabrielli, M. Severini & F. Piazza. (2016). A statistical framework for automatic leakage detection in smart water and gas grids. Energies, 9, 665. DOI : 10.3390/en9090665
  8. N. M. Noor, M. M. Al Bakri Abdullah, A. S. Yahaya & N. A. Ramli. (2015) Comparison of Linear Interpolation Method and Mean Method to Replace the Missing Values in Environmental Data Set. Materials Science Forum, 803, 278-281. https://doi.org/10.4028/www.scientific.net/MSF.803.278
  9. C. M. Salgado, C. Azevedo, H. Proenca & S. M. Vieira. (2016). Missing data. Secondary analysis of electronic health records, 143-162.
  10. Y. K. Kim & H. G. Sohn. (2018). Disasters from 1948 to 2015 in Korea and power-law distribution. In Disaster Risk Management in the Republic of Korea; pp. 77-97. Springer, Singapore.
  11. J. Peppanen, X. Zhang, S. Grijalva & M. J. Reno. (2016, September). Handling bad or missing smart meter data through advanced data imputation. In 2016 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT) (pp. 1-5). IEEE.
  12. T. Kim, W. Ko & J. Kim. (2019). Analysis and impact evaluation of missing data imputation in day-ahead PV generation forecasting. Applied Sciences, 9(1), 204. https://doi.org/10.3390/app9010204
  13. D. Khongorzul, S. M. Lee, Y. K. Kim & M. H. Kim. (2019). Image Denoising Methods based on DAECNN for Medication Prescriptions. Journal of the Korea Convergence Society, 10(5), 17-26. DOI : 10.15207/JKCS.2019.10.5.017.
  14. V. N. Vapnik. (1995). The nature of statistical learning theory. New York: Springer.
  15. Available website: Korean public data portal. https://www.data.go.kr/dataset/15000099/openapi.do