DOI QR코드

DOI QR Code

기계학습을 이용한 식품위생점검 체계의 효율성 개선 연구

Improving Efficiency of Food Hygiene Surveillance System by Using Machine Learning-Based Approaches

  • 투고 : 2020.11.11
  • 심사 : 2020.12.19
  • 발행 : 2020.12.31

초록

본 연구는 가공식품의 제조·가공 업소를 대상으로 기계학습 분야의 지도학습(Supervised Learning) 예측 모형을 적용하여 부적합이 예상되는 업체를 사전에 적발하는 단속 선별시스템을 마련하여 단속 활동의 효율성을 높이고자 하였다. 본 연구에서는 머신러닝의 예측 모델링을 위한 목적 정의, 데이터의 기초 분석과 시각화, 특성 변수 도출 및 예측 모형의 선정 및 예측 등으로 기계학습 수행의 표준적인 절차에 따라 연구를 수행하였다. 종속변수는 2014년도부터 2018년까지 과거 5년 동안 지도점검 적발 건수로 설정하였고, 목적함수는 실제 부적합업체를 사전에 판정하여 단속활동이 이루어지는 것을 최대화하는 것으로 하였다. 제조가공업소의 매출액, 영업일수, 종업원 수 등 기본속성뿐만 아니라 과거 지도점검 단속 이력 정보를 반영하여 자료를 재구성하였다. 특성 변수 추출 방법을 적용하여 부적합 판정에 영향을 미치는 업체 위험, 품목 위험, 환경 위험 및 과거 위반 이력 등을 특성 변수로 도출하여 머신러닝 알고리즘을 데이터에 적용하였다. 랜덤포레스트 모형이 식품의약품안전처 지도점검 업무 목적에 가장 적합한 것으로 나타났다. 본 연구결과를 바탕으로 식품안전 관리 국가 사무가 데이터기반의 과학적인 행정 체계로 발전할 수 있는 기반이 되기를 기대한다.

This study employees a supervised learning prediction model to detect nonconformity in advance of processed food manufacturing and processing businesses. The study was conducted according to the standard procedure of machine learning, such as definition of objective function, data preprocessing and feature engineering and model selection and evaluation. The dependent variable was set as the number of supervised inspection detections over the past five years from 2014 to 2018, and the objective function was to maximize the probability of detecting the nonconforming companies. The data was preprocessed by reflecting not only basic attributes such as revenues, operating duration, number of employees, but also the inspections track records and extraneous climate data. After applying the feature variable extraction method, the machine learning algorithm was applied to the data by deriving the company's risk, item risk, environmental risk, and past violation history as feature variables that affect the determination of nonconformity. The f1-score of the decision tree, one of ensemble models, was much higher than those of other models. Based on the results of this study, it is expected that the official food control for food safety management will be enhanced and geared into the data-evidence based management as well as scientific administrative system.

키워드

과제정보

본 연구는 2019년 식품안전정보원 국가식품안전관리체계 선진화연구사업의 일환으로 수행되었음.

참고문헌

  1. Barandela, R., V. Garc, E. Rangel, and J. S. Sanchez, "Strategies for Learning in Class Imbalance Problems", Pattern Recognition, Vol.36, No.3(2003), 849~865. https://doi.org/10.1016/S0031-3203(02)00257-1
  2. Cao, Fuyuan, Jiye Liang, Deyu Li, and Xingwang Zhao, "A Weighting K-Modes Algorithm for Subspace Clustering of Categorical Data", Neurocomputing. Vol.108(2013), 23~30. https://doi.org/10.1016/j.neucom.2012.11.009
  3. Cho, S.G. and K,H. Choi, "Study on Anomaly Detection Method of Improper Foods using Import Food Big Data", The Journal of Big Data, Vol.3, No.2(2018), 19~33.
  4. Ganganwar, Vaishali, "An Overview of Classification Algorithms for Imbalanced Datasets", International Journal of Emerging Technology and Advanced Engineering, Vol.2, No.4(2012), 42~47.
  5. Guolin Ke. LightGBM: A Highly Efficient GradientBoosting Decision Tree. Data-science. 2018.06.
  6. Jin, Huang, and C. X. Ling, "Using Auc and Accuracy in Evaluating Learning Algorithms", IEEE Transactions on Knowledge and Data Engineering, Knowledge and Data Engineering, Vol. 17, No.3(2005), 299~310. https://doi.org/10.1109/TKDE.2005.50
  7. Kang, P.S., H.J. Lee and S.Z. Cho, "Svm Ensemble Techniques for Class Imbalance Problem", KOREA INFORMATION SCIENCE SOCIETY, Vol.31, No.2(2004), 706~708.
  8. Kim, U.M. and T.H. Hong, "The Prediction of Customers based on Case Based Reasoning with Weighted Factors for imbalanced Data Sets", The Journal of Information Systems, Vo1.21, No.1(2014), 29~45.
  9. Lee, J.S. and J.G. Kwon, "A Hybrid Svm Classifier for Imbalanced Data Sets", Journal of Intelligence and Information Systems. Vol.19, No.2(2013), 125~40. https://doi.org/10.13088/jiis.2013.19.2.125
  10. Kang, P.S., H.J. Lee and S.Z. Cho, "Svm Ensemble Techniques for Class Imbalance Problem", KOREA INFORMATION SCIENCE SOCIETY, Vol.31, No.2(2004), 706~708.
  11. Marvin, H. J. P., Janssen, E. M., Bouzembrak, Y., Hendriksen, P. J. M., & Staats, M. (2017). Big data in food safety: An overview. Critical Reviews in Food Science and Nutrition, 57 (11), 2286-2295. https://doi.org/10.1080/10408398.2016.1257481
  12. Singh, Durgesh Kumar and Goel, Noopur, "Analysing Data Mining Techniques on Bank Customers for Credit Score", 2020 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), 2020 8th International Conference on. :1293-1295 Jun, 2020.
  13. Tamilarasi, P. and Rani, R.Uma, "Diagnosis of Crime Rate against Women using k-fold Cross Validation through Machine Learning", 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC) Computing Methodologies and Communication (ICCMC), 2020 Fourth International Conference on. :1034-1038 Mar, 2020.
  14. Wu, X., V. Kumar, M. Steinbach, Q. J. Ross, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, P. S. Yu, Z. H. Zhou, D. J. Hand, and D. Steinberg, "Top 10 Algorithms in Data Mining", Knowledge and Information Systems, Vol.14, No.1(2008), 1~37. https://doi.org/10.1007/s10115-007-0114-2