DOI QR코드

DOI QR Code

VAE(Variational AutoEncoder) 기반 머신러닝 모델을 활용한 체중 라이프로그 이상탐지에 관한 연구

Study on Lifelog Anomaly Detection using VAE-based Machine Learning Model

  • 김지용 (광운대학교 수학과) ;
  • 박민서 (서울여자대학교 데이터사이언스학과)
  • 투고 : 2022.05.26
  • 심사 : 2022.07.02
  • 발행 : 2022.07.31

초록

웨어러블 기기를 통해 지속적으로 수집되는 라이프로그 데이터는 많은 이상값을 포함할 수 있으므로 데이터품질을 향상시키기 위해서는 이상값을 찾아 제거하는 것이 필요하다. 일반적으로 이상치의 개수가 정상 데이터의 개수보다 적기 때문에 클래스 불균형 문제가 발생한다. 이러한 불균형 문제를 해결하기 위해 Variational AutoEncoder를 outlier에 적용하는 방법을 제안한다. 제안된 방법으로 이상치 데이터를 전처리한 후, 다수의 머신러닝 모델(분류)을 통해 검증한다. 체중 데이터를 이용한 검증 결과, 모든 분류 모델에서 성능이 향상됨을 확인하였다. 실험 결과를 바탕으로 라이프로그 체중 데이터 분석 시 본 연구에서 제안한 이상치 처리 방법을 이용하여 데이터를 전처리한 후 성능이 가장 좋은 LightGBM 모델을 적용할 것을 제안한다.

Lifelog data continuously collected through a wearable device may contain many outliers, so in order to improve data quality, it is necessary to find and remove outliers. In general, since the number of outliers is less than the number of normal data, a class imbalance problem occurs. To solve this imbalance problem, we propose a method that applies Variational AutoEncoder to outliers. After preprocessing the outlier data with proposed method, it is verified through a number of machine learning models(classification). As a result of verification using body weight data, it was confirmed that the performance was improved in all classification models. Based on the experimental results, when analyzing lifelog body weight data, we propose to apply the LightGBM model with the best performance after preprocessing the data using the outlier processing method proposed in this study.

키워드

참고문헌

  1. Park, H.-S., and Moon, P.-J. (2021). A Study on Fashion Items to Prevent COVID-19 Using Wearable Technology. International Journal of Advanced Culture Technology, 9(3), 277-282. DOI: 10.17703/IJACT.2021.9.3.277
  2. Park, M. (2022). Lifelog Analysis and Future using Artificial Intelligence in Healthcare. The Journal of the Convergence on Culture Technology, 8(2), 1-6. DOI: 10.17703/JCCT.2022.8.2.1
  3. Kim, Jiyong, Jiyoung Lee, and Minseo Park (2022). Identification of Smartwatch-Collected Lifelog Variables Affecting Body Mass Index in Middle-Aged People Using Regression Machine Learning Algorithms and SHapley Addictive Explanations. Applied Sciences 12.8. DOI: 10.3390/app12083819
  4. Sanyal, D., and Raychaudhuri, M. (2016), Hypothyroidism and obesity, Indian journal of endocrinology and metabolism, 20(4), 554. DOI: 10.4103/2230-8210.183454
  5. Zheng, Y., Manson, J. E., Yuan, C., Liang, M. H., Grodstein, F., Stampfer, M. J., Willett, W. C., and Hu, F. B. (2017), Associations of weight gain from early to middle adulthood with major health outcomes later in life. Jama, 318(3), 255-269. DOI: 10.1001/jama.2017.7092
  6. Yatabe, Z. and Asubar, J. T. (2021), Ornstein-Uhlenbeck process in a human body weight fluctuation, Elsevier, 582. DOI: 10.1016/j.physa.2021.126286
  7. Westerterp, K. R. (2020), Seasonal variation inbody mass, body composition and activity-induced energy expenditure: along-term study, European Journal of Clinical Nutrition, 74, 135-140. DOI: 10.1038/s41430-019-0408-y
  8. Fahey, M. C., Klesges, R. C., Kocak, M., Talcott, G. W., and Krukowski, R. A. (2020), Seasonal fluctuations in weight and self-weighing behavior among adults in a behavioral weight loss intervention. Eating and weight disorders, 25(4), 921-928. DOI: 10.1007/s40519-019-00707-7
  9. Racette, S. B., Weiss, E. P., Schechtman, K. B., Steger-May, K., Villareal, D. T., Obert, K. A., and Holloszy, J. O. (2008), Influence of weekend lifestyle patterns on body weight. Eat Weight Disord, 16(8), 1826-1830. DOI: 10.1038/oby.2008.320
  10. Orsama, A. L., Mattila, E., Ermes, M., van Gils, M., Wansink, B., and Korhonen, I. (2014), Weight rhythms: weight increases during weekends and decreases during weekdays. Obesity facts, 7(1), 36-47. DOI: 10.1159/000356147
  11. Somes, G. W., Kritchevsky, S. B., Shorr, R. I., Pahor, M., and Applegate, W-B. (2002), Body mass index, weight change, and death in older adults: the systolic hypertension in the elderly program, American journal of epidemiology, 156(2), 132-138. DOI: 10.1093/aje/kwf019
  12. Lin, M. Y., Liu, M. F., Hsu, L. F., and Tsai, P. S. (2017), Effects of self-management on chronic kidney disease: A meta-analysis. International journal of nursing studies, 74, 128-137. DOI: 10.1016/j.ijnurstu.2017.06.008
  13. Kim, Jiyong and Minseo Park(2022). A New Body Weight Lifelog Outliers Generation Method: Reflecting Characteristics of Body Weight Data. Applied Sciences 12.9. DOI: 10.3390/app12094726
  14. Arslan, M., Guzel, M., Demirci, M. C., and Ozdemir, S. (2019), SMOTE and Gaussian Noise Based Sensor Data Augmentation. 4th Int. Conf. on Computer Science and Engineering, 1-5. DOI: 10.1109/UBMK.2019.8907003
  15. Park, J., Ahn, G., and Hur, S. (2020). Oversampling Based on k-NN and GAN for Effective Classification of Class Imbalance Dataset, Journal of the Korean Institute of Industrial Engineers, 46(4), 365-371. DOI: 10.7232/JKIIE.2020.46.4.365
  16. Jiang, Z., Pan, T., Zhang, C., and Yang, J. (2021), A New Oversampling Method Based on the Classification Contribution Degree. Symmetry, 13(2), 194. DOI: 10.3390/sym13020194
  17. Kumar, S., and Dhawan, S. (2020), A Detailed Study on Generative Adversarial Networks, 5th Int. Conf. on Communication and Electronics Systems, 641-645. DOI: 10.1109/ICCES48766.2020.9137883
  18. Chalapathy, R., and Chawla, S. (2019), Deep learning for anomaly detection: A survey, arXiv: 1901.03407. DOI: 10.48550/arXiv.1901.03407
  19. Wang, H., Bah, M. J., and Hammad, M. (2019), Progress in outlier detection techniques: A survey, Ieee Access, 7, 107964-108000. DOI: 10.1109/ACCESS.2019.2932769.
  20. Zhang, C., Zhou, Y., Chen, Y., Deng, Y., Wang, X., Dong, L., and Wei, H. (2018), Over-sampling algorithm based on vae in imbalanced classification, Int. Conf. on Cloud Computing, 10967, 334-344. DOI: 10.1007/978-3-319-94295-7_23
  21. Primartha, R., and Tama, B. A. (2017), Anomaly detection using random forest: A performance revisited, Int. Conf. on Data and Software Engineering, 1-6. DOI: 10.1109/ICODSE.2017.8285847
  22. Chawla, N., Bowyer, K., Hall, L. O., and Kegelmeyer, W. P. (2002), SMOTE: Synthetic Minority Over-sampling Technique, Journal of Artificial Intelligence Research, 16, 321-357. DOI: 10.1613/jair.953
  23. Han, H., Wang, W. Y., and Mao, B. H. (2005), Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning, Int. Conf. on intelligent computing, 878-887. DOI: 10.1007/11538059_91
  24. He, H., Bai, Y., Garcia, E-A., and Li, S. (2008), ADASYN: Adaptive synthetic sampling approach for imbalanced learning, Int. joint Conf. on neural networks, 1322-1328. DOI: 10.1109/IJCNN.2008.4633969
  25. Kingma, D. P., and Welling, M. (2013), Auto-encoding variational bayes, arXiv:1312.6114. DOI: 10.48550/arXiv.1312.6114
  26. Lucas, J., Tucker, G., Grosse, R. B., and Norouzi, M. (2019), Don't blame the Elbo! a linear Vae perspective on posterior collapse. Advances in Neural Information Processing Systems, 32, 9408-9418. DOI: 10.48550/arXiv.1911.02469
  27. Yu, R. (2020), A Tutorial on VAEs: From Bayes' Rule to Lossless Compression., arXiv:200 6.10273. DOI: 10.48550/arXiv.2006.10273
  28. Xu, M., Quiroz, M., Kohn, R., and Sisson, S-A. (2019), Variance reduction properties of the reparameterization trick, 22nd International Conference on Artificial Intelligence and Statistics, 2711-2720. DOI: 10.48550/arXiv.1809.10330
  29. Rezende, D. J., Mohamed, S., and Wierstra, D. (2014), Stochastic backpropagation and variational inference in deep latent gaussian models, Int. Conf. on Machine Learning, 2(2). DOI: 10.48550/ arXiv.1401.4082
  30. Ovinnikov, I. (2019), Poincar\'e Wasserstein Autoencoder, DOI: 10.48550/arXiv.1901.01427
  31. Garcia-Ordas, M. T., Benavides, C., Benitez-Andrades, J. A., Alaiz-Moreton, H., and Garcia-Rodriguez, I. (2021), Diabetes detection using deep learning techniques with oversampling and feature augmentation, Computer Methods and Programs in Biomedicine, 202, 105968. DOI: 10.1016/j.cmpb.2021.105968
  32. Moreno-Barea, F. J., Jerez, J. M., and Franco, L. (2020), Improving classification accuracy using data augmentation on small data sets. Expert Systems with Applications, 161, 113696. DOI: 10.1016/j.eswa.2020.113696
  33. Garcia-Ordas, M. T., Benitez-Andrades, J. A., Garcia-Rodriguez, I., Benavides, C., and Alaiz-Moreton, H. (2020), Detecting respiratory pathologies using convolutional neural networks and variational autoencoders for unbalancing data, Sensors, 20(4), 1214. DOI: 10.3390/s20041214
  34. Mirheidari, B., Pan, Y., Blackburn, D., O'Malley, R., Walker, T., Venneri, A., Reuber, M., and Christensen, H. (2020), Data augmentation using generative networks to identify dementia, arXiv:2004.05989. DOI: 10.48550/arXiv.2004.05989
  35. Ferri, C., Hernandez-Orallo, J., and Modroiu, R. (2009), An experimental comparison of performance measures for classification, Pattern Recognition Letters, 30(1), 27-38. DOI: 10.1016/j.patrec.2008.08.010
  36. Sun, Y., Wong, A. K., and Kamel, M. S. (2009), Classification of imbalanced data: A review, International journal of pattern recognition and artificial intelligence, 23(04), 687-719. DOI: 10.1142/S0218001409007326
  37. Branco, P., Torgo, L., and Ribeiro, R. (2015), A survey of predictive modelling under imbalanced distributions, arXiv:1505.01658. DOI: 10.48550/arXiv.1505.01658
  38. Fan, J., Sun, C., Chen, C., Jiang, X., Liu, X., Zhao, X., Meng, L., Dai, C., and Chen, W. (2020), EEG data augmentation: towards class imbalance problem in sleep staging tasks, Journal of Neural Engineering, 17(5). DOI: 10.1088/1741-2552/abb5be
  39. Jeni, L. A., Cohn, J. F., and De La Torre, F. (2013), Facing imbalanced data-recommendations for the use of performance metrics, Humaine Assoc. Conf. on affective computing and intelligent interaction, 245-251. DOI: 10.1109/ACII.2013.47
  40. Chicco, D., and Jurman, G. (2020), The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation, BMC genomics, 21(1), 1-13. DOI: 10.1186-/s12864-019-6413-7 https://doi.org/10.1186-/s12864-019-6413-7