Browse > Article
http://dx.doi.org/10.22156/CS4SMB.2021.11.06.049

Data Processing of AutoML-based Classification Models for Improving Performance in Unbalanced Classes  

Lee, Dong-Joon (Division of AI Computer Science and Engineering, Kyonggi University)
Kang, Ji-Soo (Department of Computer Science, Kyonggi University)
Chung, Kyungyong (Division of AI Computer Science and Engineering, Kyonggi University)
Publication Information
Journal of Convergence for Information Technology / v.11, no.6, 2021 , pp. 49-54 More about this Journal
Abstract
With the recent development of smart healthcare technology, interest in daily diseases is increasing. However, healthcare data has an imbalance between positive and negative data. This is caused by the difficulty of collecting data because there are relatively many people who are not patients compared to patients with certain diseases. Data imbalances need to be adjusted because they affect performance in ongoing learning during disease prediction and analysis. Therefore, in this paper, We replace missing values through multiple imputation in detection models to determine whether they are prevalent or not, and resolve data imbalances through over-sampling. Based on AutoML using preprocessed data, We generate several models and select top 3 models to generate ensemble models.
Keywords
Data Imbalance; Oversampling; Healthcare; AutoML; Data Imputation;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Y. Liu, E. B. Gold, B. L. Lasley & W. O. Johnson. (2004). Factors affecting menstrual cycle characteristics. American journal of epidemiology, 160(2), 131-140. DOI : 10.1093/aje/kwh188   DOI
2 J. Hao & T. K. Ho. (2019). Machine learning made easy: A review of scikit-learn package in Python programming language. Journal of Educational and Behavioral Statistics, 44(3), 348-361. DOI : 10.3102/1076998619832248   DOI
3 S. E. Ryu, D. H. Shin & K. Chung. (2020). Prediction model of dementia risk based on XGBoost using derived variable extraction and hyper parameter optimization. IEEE Access, 8, 177708-177720. DOI : 10.1109/ACCESS.2020.3025553   DOI
4 N. V. Chawla, K. W. Bowyer, L. O. Hall & W. P. Kegelmeyer. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357. DOI : 10.1613/jair.953   DOI
5 D. J. Lee, J. S. Kang, M. J. Kim, J. W. Baek & K. Chung. (2021). Data Imbalance Processing through Over-sampling in Binary Classification Model. Korean Society For Internet Information Spring Conference, 77-78.
6 J. C. Kim & K. Chung. (2020). Hybrid Multi-Modal Deep Learning using Collaborative Concat Layer in Health Bigdata. IEEE Access, 8, 192469-192480. DOI : 10.1109/ACCESS.2020.3031762   DOI
7 H. Asri, H. Mousannif, H. Al Moatassime & T. Noel. (2015, June). Big data in healthcare: challenges and opportunities. In International Conference on Cloud Technologies and Applications (CloudTech), 1-7.
8 S. V. Buuren & K. Groothuis-Oudshoorn. (2010). mice: Multivariate imputation by chained equations in R. Journal of statistical software, 1-68.
9 C. Gong & L. Gu. (2016). A novel SMOTE-based classification approach to online data imbalance problem. Mathematical Problems in Engineering, 1-14.
10 The Fifth Korea National Health and Nutrition Examination Survey (KNHANES V-2). (2015). Korea Centers for Disease Control and Prevention.
11 B. L. Drinkwater, B. Bruemner & C. H. Chesnut. (1990). Menstrual history as a determinant of current bone density in young athletes. Jama, 263(4), 545-548. DOI : 10.1001/jama.1990.03440040084033   DOI
12 Y. A. Shreider (2014). The Monte Carlo method: the method of statistical trials(Vol. 87). Elsevier.
13 I. R. White, P. Royston & A. M. Wood. (2011). Multiple imputation using chained equations: issues and guidance for practice. Statistics in medicine, 30(4), 377-399. DOI : 10.1002/sim.4067   DOI
14 J. C. Kim & K. Chung. (2020). Multi-modal stacked denoising autoencoder for handling missing data in healthcare big data. IEEE Access, 8, 104933-104943. DOI : 10.1109/ACCESS.2020.2997255   DOI
15 A. Truong, A. Walters, J. Goodsitt, K. Hines, C. B. Bruss & R. Farivar. (2019). Towards automated machine learning: Evaluation and comparison of AutoML approaches and tools. In 2019 IEEE 31st International Conference on Tools with Artificial Intelligence(ICTAI), 1471-1479. DOI : 10.1109/ICTAI.2019.00209   DOI