Acknowledgement
This research was supported by a grant (21163MFDS516) from Ministry of Food and Drug Safety in 2022.
References
- 김은미, & 홍태호. (2015). 불균형 데이터 환경에서 변수가중치를 적용한 사례기반추론 기반의 고객반응 예측. 지능정보연구, 21(1), 29-45. https://doi.org/10.13088/JIIS.2015.21.1.29
- 장동식, 이상호. (2016). 미국의 수입식품안전관리시스템 분석-가공식품을 중심으로. 국제상학, 31(4), 325-350.
- 조상구, 조승용. (2020). 기계학습을 이용한 식품위생점검 체계의 효율성 개선 연구. 한국빅데이터학회지, 5(2), 53-67.
- 조상구, 최경현. (2018). 수입식품 빅데이터를 이용한 부적합식품 탐지 시스템에 관한 연구. 한국빅데이터학회지, 3(2), 19-33.
- Abouelenien, M., Yuan, X., Giritharan, B., Liu, J., & Tang, S. (2013). Cluster-based sampling and ensemble for bleeding detection in capsule endoscopy videos. American Journal of Science and Engineering, 2(1), 24-32.
- Ahmed, M., Mahmood, A. N., & Islam, M. R. (2016). A survey of anomaly detection techniques in financial domain. Future Generation Computer Systems, 55, 278-288. https://doi.org/10.1016/j.future.2015.01.001
- Ahsan, M. M., Mahmud, M. P., Saha, P. K., Gupta, K. D., & Siddique, Z. (2021). Effect of data scaling methods on machine learning algorithms and model performance. Technologies, 9(3), 52. https://doi.org/10.3390/technologies9030052
- Bach, M., Werner, A., Zywiec, J., & Pluskiewicz, W. (2017). The study of under-and over-sampling methods' utility in analysis of highly imbalanced data on osteoporosis. Information Sciences, 384, 174-190. https://doi.org/10.1016/j.ins.2016.09.038
- Burez, J., & Van den Poel, D. (2009). Handling class imbalance in customer churn prediction Expert Systems with Applications, 36(3), 4626-4636. https://doi.org/10.1016/j.eswa.2008.05.027
- Cawley, G. C., & Talbot, N. L. (2010). On over-fitting in model selection and subsequent selection bias in performance evaluation. The Journal of Machine Learning Research, 11, 2079-2107.
- Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357. https://doi.org/10.1613/jair.953
- Chomboon, K., Kerdprasop, K., & Kerdprasop, N. (2013). Rare class discovery techniques for highly imbalance data. In Proc. International multi conference of engineers and computer scientists (Vol. 1).
- Cieslak, D. A., & Chawla, N. V. (2008, September). Learning decision trees for unbalanced data. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 241-256). Springer, Berlin, Heidelberg.
- Cui, B., & He, S. (2016, July). Anomaly detection model based on hadoop platform and weka interface. In 2016 10th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS) (pp. 84-89). IEEE.
- Durica, M., & Svabova, L. (2015). Improvement of company marketing strategy based on Google search results analysis. Procedia Economics and Finance, 26, 454-460. https://doi.org/10.1016/S2212-5671(15)00873-4
- Eltanbouly, S., Bashendy, M., AlNaimi, N., Chkirbene, Z., & Erbad, A. (2020, February). Machine learning techniques for network anomaly detection: A survey. In 2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies (ICIoT) (pp. 156-162). IEEE.
- Ganganwar, V. (2012). An overview of classification algorithms for imbalanced datasets. International Journal of Emerging Technology and Advanced Engineering, 2(4), 42-47.
- GFSI - what we do, https://mygfsi.com/what-we-do/harmonisation/, 2022.
- Guo, H., & Viktor, H. L. (2004). Learning from imbalanced data sets with boosting and data generation: the databoost-im approach. ACM Sigkdd Explorations Newsletter, 6(1), 30-39. https://doi.org/10.1145/1007730.1007736
- Hancock, J., & Khoshgoftaar, T. M. (2020, August). Medicare fraud detection using catboost. In 2020 IEEE 21st international conference on information reuse and integration for data science (IRI) (pp. 97-103). IEEE.
- Jeong, H., Jang, Y., Bowman, P. J., & Masoud, N. (2018). Classification of motor vehicle crash injury severity: A hybrid approach for imbalanced data. Accident Analysis & Prevention, 120, 250-261. https://doi.org/10.1016/j.aap.2018.08.025
- Jin, C., Bouzembrak, Y., Zhou, J., Liang, Q., Van Den Bulk, L. M., Gavai, A., ... & Marvin, H. J. (2020). Big Data in food safety-A review. Current Opinion in Food Science, 36, 24-32. https://doi.org/10.1016/j.cofs.2020.11.006
- Kamei, Y., Monden, A., Matsumoto, S., Kakimoto, T., & Matsumoto, K. I. (2007, September). The effects of over and under sampling on fault-prone module detection. In First international symposium on empirical software engineering and measurement (ESEM 2007) (pp. 196-204). IEEE.
- Kang, S., & Shin, K. S. (2021). Conditional generative adversarial network based collaborative filtering recommendation system. Journal of Intelligence and Information Systems, 27(3), 157-173. https://doi.org/10.13088/JIIS.2021.27.3.157
- Kaur, H., Pannu, H. S., & Malhi, A. K. (2019). A systematic review on imbalanced data challenges in machine learning: Applications and solutions. ACM Computing Surveys (CSUR), 52(4), 1-36.
- Kim, J., Kim, M. Y., & Kwon, O. (2020). The effect of meta-features of multiclass datasets on the performance of classification algorithms. Journal of Intelligence and Information Systems, 26(1), 23-45. https://doi.org/10.13088/JIIS.2020.26.1.023
- Kleboth, J. A., Kosorus, H., Rechberger, T., & Luning, P. A. (2022). Using data mining as a tool for anomaly detection in food safety audit data. Food Control, 138, 109004. https://doi.org/10.1016/j.foodcont.2022.109004
- Liu, J., Gao, Y., & Hu, F. (2021). A fast network intrusion detection system using adaptive synthetic oversampling and LightGBM. Computers & Security, 106, 102289. https://doi.org/10.1016/j.cose.2021.102289
- Marvin, H. J., Bouzembrak, Y., Janssen, E. M., van der Fels-Klerx, H. V., van Asselt, E. D., & Kleter, G. A. (2016). A holistic approach to food safety risks: Food fraud as an example. Food research international, 89, 463-470. https://doi.org/10.1016/j.foodres.2016.08.028
- Marvin, H. J., Janssen, E. M., Bouzembrak, Y., Hendriksen, P. J., & Staats, M. (2017). Big data in food safety: An overview. Critical reviews in food science and nutrition, 57(11), 2286-2295. https://doi.org/10.1080/10408398.2016.1257481
- Nassif, A. B., Talib, M. A., Nasir, Q., & Dakalbab, F. M. (2021). Machine learning for anomaly detection: A systematic review. Ieee Access, 9, 78658-78700. https://doi.org/10.1109/ACCESS.2021.3083060
- Nguyen, H. M., Cooper, E. W., & Kamei, K. (2012, November). A comparative study on sampling techniques for handling class imbalance in streaming data. In The 6th International Conference on Soft Computing and Intelligent Systems, and The 13th International Symposium on Advanced Intelligence Systems (pp. 1762-1767). IEEE.
- Niculescu-Mizil, A., & Caruana, R. (2005, August). Predicting good probabilities with supervised learning. In Proceedings of the 22nd international conference on Machine learning (pp. 625-632).
- Ntekouli, M., Spanakis, G., Waldorp, L., & Roefs, A. (2022, April). Using Explainable Boosting Machine to Compare Idiographic and Nomothetic Approaches for Ecological Momentary Assessment Data. In International Symposium on Intelligent Data Analysis (pp. 199-211). Springer, Cham.
- Omar, S., Ngadi, A., & Jebur, H. H. (2013). Machine learning techniques for anomaly detection: an overview. International Journal of Computer Applications, 79(2).
- Pachauri, G., & Sharma, S. (2015). Anomaly detection in medical wireless sensor networks using machine learning algorithms. Procedia Computer Science, 70, 325-333. https://doi.org/10.1016/j.procs.2015.10.026
- Pargent, F., Pfisterer, F., Thomas, J., & Bischl, B. (2022). Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features. Computational Statistics, 1-22.
- Sharif, A., Abbasi, Q. H., Arshad, K., Ansari, S., Ali, M. Z., Kaur, J., ... & Imran, M. A. (2021). Machine learning enabled food contamination detection using RFID and internet of things system. Journal of Sensor and Actuator Networks, 10(4), 63. https://doi.org/10.3390/jsan10040063
- Singh, A., & Purohit, A. (2015). A survey on methods for solving data imbalance problem for classification. International Journal of Computer Applications, 127(15), 37-41. https://doi.org/10.5120/ijca2015906677
- Tanha, J., Abdi, Y., Samadi, N., Razzaghi, N., & Asadpour, M. (2020). Boosting methods for multi-class imbalanced data classification: an experimental review. Journal of Big Data, 7(1), 1-47. https://doi.org/10.1186/s40537-019-0278-0
- Tsamardinos, I., & Aliferis, C. F. (2003, January). Towards principled feature selection: Relevancy, filters and wrappers. In International Workshop on Artificial Intelligence and Statistics (pp. 300-307). PMLR.
- Wu, L., Liu, Z., Bera, T., Ding, H., Langley, D. A., Jenkins-Barnes, A., ... & Xu, J. (2019). A deep learning model to recognize food contaminating beetle species based on elytra fragments. Computers and Electronics in Agriculture, 166, 105002. https://doi.org/10.1016/j.compag.2019.105002
- Yap, B. W., Rani, K. A., Rahman, H. A. A., Fong, S., Khairudin, Z., & Abdullah, N. N. (2014). An application of oversampling, undersampling, bagging and boosting in handling imbalanced datasets. In Proceedings of the first international conference on advanced data and information engineering (DaEng-2013) (pp. 13-22). Springer, Singapore.
- Zhang, Y. P., Zhang, L. N., & Wang, Y. C. (2010, September). Cluster-based majority under-sampling approaches for class imbalance learning. In 2010 2nd IEEE International Conference on Information and Financial Engineering (pp. 400-404). IEEE.
- Zhao, Y., Deng, B., Shen, C., Liu, Y., Lu, H., & Hua, X. S. (2017, October). Spatio-temporal autoencoder for video anomaly detection. In Proceedings of the 25th ACM international conference on Multimedia (pp. 1933-1941).