DOI QR코드

DOI QR Code

A Study on the prediction of BMI(Benthic Macroinvertebrate Index) using Machine Learning Based CFS(Correlation-based Feature Selection) and Random Forest Model

머신러닝 기반 CFS(Correlation-based Feature Selection)기법과 Random Forest모델을 활용한 BMI(Benthic Macroinvertebrate Index) 예측에 관한 연구

  • Received : 2019.06.18
  • Accepted : 2019.09.25
  • Published : 2019.09.30

Abstract

Recently, people have been attracting attention to the good quality of water resources as well as water welfare. to improve the quality of life. This study is a papers on the prediction of benthic macroinvertebrate index (BMI), which is a aquatic ecological health, using the machine learning based CFS (Correlation-based Feature Selection) method and the random forest model to compare the measured and predicted values of the BMI. The data collected from the Han River's branch for 10 years are extracted and utilized in 1312 data. Through the utilized data, Pearson correlation analysis showed a lack of correlation between single factor and BMI. The CFS method for multiple regression analysis was introduced. This study calculated 10 factors(water temperature, DO, electrical conductivity, turbidity, BOD, $NH_3-N$, T-N, $PO_4-P$, T-P, Average flow rate) that are considered to be related to the BMI. The random forest model was used based on the ten factors. In order to prove the validity of the model, $R^2$, %Difference, NSE (Nash-Sutcliffe Efficiency) and RMSE (Root Mean Square Error) were used. Each factor was 0.9438, -0.997, and 0,992, and accuracy rate was 71.6% level. As a result, These results can suggest the future direction of water resource management and Pre-review function for water ecological prediction.

Keywords

References

  1. Breiman, L. (2001). Random Forests, Machine Learning, 45(1), 5. Available at: http://search.ebscohost.com.proxy.konkuk.ac.kr:8080/login.aspx?direct=true&db=edo&AN=ejs37250840&lang=ko&site=eds-live&scope=site (Accessed: 1 October 2019). https://doi.org/10.1023/A:1010933404324
  2. Choi, J. H. and Seo, D. S. (1999). Decision trees and its applications, Journal of The Korean Official Statistics, 4 (1), 61-83. [Korean Literature]
  3. Donigian, Jr. A. S. (2000). HSPF training workshop handbook and CD, Lecture #19, Calibraion and Verification Issures, Slide #L19-22 EPA Headquarters, Presented and prepared for US EPA.
  4. Hall, M. A. (1999). Correlation-based feature selection for machine learning, PhD Thesis, Department of Computer Science, The University of Waikato, New Zealand.
  5. Kim M. R. and Park M. H. (2019). An analysis of the characteristics of college students according to first-time participation in private tutoring using a random forest, CNU Journal of educational studies, 40(1), 1-33. [Korean Literature] https://doi.org/10.18612/cnujes.2019.40.1.1
  6. Kim S. H., Lee E. J., Na J. S., and Choi J. W. (2014). Calibration of an UV distribution model by Nash-Sutcliffe efficiency coefficient, Korean Society of Civil Engineers, 1813-1814. [Korean Literature]
  7. Korea Institute of Science & Technology Evaluation and Planning (KISTEP). (2019). Technology development project for securing the ecosystem health, https://www.kistep.re.kr/c3/sub2_4.jsp?brdType=R&bbIdx=12605. 63-67. [Korean Literature]
  8. Kum D. H., Ryu J. C., Sung Y. S., Han J. H.. and Lim G. J. (2017). P-8 : Development and Assessment for extended daily streamflow regression equation of TMDL station using Machine Learning, Proceedings of the 2017 Spring Co-Conference of the Korean Society on Water Environment and Korean Society of Water and Wastewater, Korean Society on Water Environment and Korean Society of Water and Wastewater, 289-290. [Korean Literature]
  9. Park, H. J. (2018). The study of local government organizational reform guideline for integrated water management, http://www.prism.go.kr/homepage/entire/retrieveEntireDetail.do?research_id=1480000-201800124 (accessed Feb. 2018), Ministry of Environment. 4-19. [Korean Literature]
  10. Lee. H. S. (2012). A study on CFS-variable subset selection method for classification, Doctor's Thesis. Sungkyunkwan University, 1-8. [Korean Literature]
  11. Lee H. J. and Chung G. H. (2019). Categorical prediction and improvement plan of snow damage estimation using random forest, Journal of Wetlands Research, 21(2), 157-162. [Korean Literature] https://doi.org/10.17663/JWR.2019.21.2.157
  12. Moriasi, D. N., Arniold, J. G., Van Liew, M. W., Bingner, R. L., Harmel, R. D. and Veith, T. L. (2007). Model evaluation guidelines for systematic quantification of accuracy in watershed simulations, Transactions of the ASABE, 50(3), 885-900. doi: 10.13031/2013.23153.
  13. Ministry of Land, Infrastructure and Transport (MOLIT). (2014). Korea river catalog, Ministry of Land, Infrastructure and Transport, 3-5. [Korean Literature]
  14. Nash, J. E. and Sutcliffe, J. V. (1970). River flow forecasting through conceptual models. Part I - A discussion of principles, Journal of hydrology, 10(3), 282-290. https://doi.org/10.1016/0022-1694(70)90255-6