Study on Lifelog Anomaly Detection using VAE-based Machine Learning Model

Kim, Jiyong;Park, Minseo;

doi:10.17703/JCCT.2022.8.4.91

문화기술의 융합 (The Journal of the Convergence on Culture Technology)

제8권4호
/
Pages.91-98
/
2022
/
2384-0358(pISSN)
/
2384-0366(eISSN)

국제문화기술진흥원 (The International Promotion Agency of Culture Technology)

DOI QR Code

VAE(Variational AutoEncoder) 기반 머신러닝 모델을 활용한 체중 라이프로그 이상탐지에 관한 연구

Study on Lifelog Anomaly Detection using VAE-based Machine Learning Model

김지용 (광운대학교 수학과) ;
박민서 (서울여자대학교 데이터사이언스학과)

Kim, Jiyong ;
Park, Minseo (Dept. of Data Science, Seoul Women's Univ)

투고 : 2022.05.26
심사 : 2022.07.02
발행 : 2022.07.31

https://doi.org/10.17703/JCCT.2022.8.4.91 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

웨어러블 기기를 통해 지속적으로 수집되는 라이프로그 데이터는 많은 이상값을 포함할 수 있으므로 데이터품질을 향상시키기 위해서는 이상값을 찾아 제거하는 것이 필요하다. 일반적으로 이상치의 개수가 정상 데이터의 개수보다 적기 때문에 클래스 불균형 문제가 발생한다. 이러한 불균형 문제를 해결하기 위해 Variational AutoEncoder를 outlier에 적용하는 방법을 제안한다. 제안된 방법으로 이상치 데이터를 전처리한 후, 다수의 머신러닝 모델(분류)을 통해 검증한다. 체중 데이터를 이용한 검증 결과, 모든 분류 모델에서 성능이 향상됨을 확인하였다. 실험 결과를 바탕으로 라이프로그 체중 데이터 분석 시 본 연구에서 제안한 이상치 처리 방법을 이용하여 데이터를 전처리한 후 성능이 가장 좋은 LightGBM 모델을 적용할 것을 제안한다.

Lifelog data continuously collected through a wearable device may contain many outliers, so in order to improve data quality, it is necessary to find and remove outliers. In general, since the number of outliers is less than the number of normal data, a class imbalance problem occurs. To solve this imbalance problem, we propose a method that applies Variational AutoEncoder to outliers. After preprocessing the outlier data with proposed method, it is verified through a number of machine learning models(classification). As a result of verification using body weight data, it was confirmed that the performance was improved in all classification models. Based on the experimental results, when analyzing lifelog body weight data, we propose to apply the LightGBM model with the best performance after preprocessing the data using the outlier processing method proposed in this study.

키워드

참고문헌

Park, H.-S., and Moon, P.-J. (2021). A Study on Fashion Items to Prevent COVID-19 Using Wearable Technology. International Journal of Advanced Culture Technology, 9(3), 277-282. DOI: 10.17703/IJACT.2021.9.3.277
Park, M. (2022). Lifelog Analysis and Future using Artificial Intelligence in Healthcare. The Journal of the Convergence on Culture Technology, 8(2), 1-6. DOI: 10.17703/JCCT.2022.8.2.1
Kim, Jiyong, Jiyoung Lee, and Minseo Park (2022). Identification of Smartwatch-Collected Lifelog Variables Affecting Body Mass Index in Middle-Aged People Using Regression Machine Learning Algorithms and SHapley Addictive Explanations. Applied Sciences 12.8. DOI: 10.3390/app12083819
Sanyal, D., and Raychaudhuri, M. (2016), Hypothyroidism and obesity, Indian journal of endocrinology and metabolism, 20(4), 554. DOI: 10.4103/2230-8210.183454
Zheng, Y., Manson, J. E., Yuan, C., Liang, M. H., Grodstein, F., Stampfer, M. J., Willett, W. C., and Hu, F. B. (2017), Associations of weight gain from early to middle adulthood with major health outcomes later in life. Jama, 318(3), 255-269. DOI: 10.1001/jama.2017.7092
Yatabe, Z. and Asubar, J. T. (2021), Ornstein-Uhlenbeck process in a human body weight fluctuation, Elsevier, 582. DOI: 10.1016/j.physa.2021.126286
Westerterp, K. R. (2020), Seasonal variation inbody mass, body composition and activity-induced energy expenditure: along-term study, European Journal of Clinical Nutrition, 74, 135-140. DOI: 10.1038/s41430-019-0408-y
Fahey, M. C., Klesges, R. C., Kocak, M., Talcott, G. W., and Krukowski, R. A. (2020), Seasonal fluctuations in weight and self-weighing behavior among adults in a behavioral weight loss intervention. Eating and weight disorders, 25(4), 921-928. DOI: 10.1007/s40519-019-00707-7
Racette, S. B., Weiss, E. P., Schechtman, K. B., Steger-May, K., Villareal, D. T., Obert, K. A., and Holloszy, J. O. (2008), Influence of weekend lifestyle patterns on body weight. Eat Weight Disord, 16(8), 1826-1830. DOI: 10.1038/oby.2008.320
Orsama, A. L., Mattila, E., Ermes, M., van Gils, M., Wansink, B., and Korhonen, I. (2014), Weight rhythms: weight increases during weekends and decreases during weekdays. Obesity facts, 7(1), 36-47. DOI: 10.1159/000356147
Somes, G. W., Kritchevsky, S. B., Shorr, R. I., Pahor, M., and Applegate, W-B. (2002), Body mass index, weight change, and death in older adults: the systolic hypertension in the elderly program, American journal of epidemiology, 156(2), 132-138. DOI: 10.1093/aje/kwf019
Lin, M. Y., Liu, M. F., Hsu, L. F., and Tsai, P. S. (2017), Effects of self-management on chronic kidney disease: A meta-analysis. International journal of nursing studies, 74, 128-137. DOI: 10.1016/j.ijnurstu.2017.06.008
Kim, Jiyong and Minseo Park(2022). A New Body Weight Lifelog Outliers Generation Method: Reflecting Characteristics of Body Weight Data. Applied Sciences 12.9. DOI: 10.3390/app12094726
Arslan, M., Guzel, M., Demirci, M. C., and Ozdemir, S. (2019), SMOTE and Gaussian Noise Based Sensor Data Augmentation. 4th Int. Conf. on Computer Science and Engineering, 1-5. DOI: 10.1109/UBMK.2019.8907003
Park, J., Ahn, G., and Hur, S. (2020). Oversampling Based on k-NN and GAN for Effective Classification of Class Imbalance Dataset, Journal of the Korean Institute of Industrial Engineers, 46(4), 365-371. DOI: 10.7232/JKIIE.2020.46.4.365
Jiang, Z., Pan, T., Zhang, C., and Yang, J. (2021), A New Oversampling Method Based on the Classification Contribution Degree. Symmetry, 13(2), 194. DOI: 10.3390/sym13020194
Kumar, S., and Dhawan, S. (2020), A Detailed Study on Generative Adversarial Networks, 5th Int. Conf. on Communication and Electronics Systems, 641-645. DOI: 10.1109/ICCES48766.2020.9137883
Chalapathy, R., and Chawla, S. (2019), Deep learning for anomaly detection: A survey, arXiv: 1901.03407. DOI: 10.48550/arXiv.1901.03407
Wang, H., Bah, M. J., and Hammad, M. (2019), Progress in outlier detection techniques: A survey, Ieee Access, 7, 107964-108000. DOI: 10.1109/ACCESS.2019.2932769.
Zhang, C., Zhou, Y., Chen, Y., Deng, Y., Wang, X., Dong, L., and Wei, H. (2018), Over-sampling algorithm based on vae in imbalanced classification, Int. Conf. on Cloud Computing, 10967, 334-344. DOI: 10.1007/978-3-319-94295-7_23
Primartha, R., and Tama, B. A. (2017), Anomaly detection using random forest: A performance revisited, Int. Conf. on Data and Software Engineering, 1-6. DOI: 10.1109/ICODSE.2017.8285847
Chawla, N., Bowyer, K., Hall, L. O., and Kegelmeyer, W. P. (2002), SMOTE: Synthetic Minority Over-sampling Technique, Journal of Artificial Intelligence Research, 16, 321-357. DOI: 10.1613/jair.953
Han, H., Wang, W. Y., and Mao, B. H. (2005), Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning, Int. Conf. on intelligent computing, 878-887. DOI: 10.1007/11538059_91
He, H., Bai, Y., Garcia, E-A., and Li, S. (2008), ADASYN: Adaptive synthetic sampling approach for imbalanced learning, Int. joint Conf. on neural networks, 1322-1328. DOI: 10.1109/IJCNN.2008.4633969
Kingma, D. P., and Welling, M. (2013), Auto-encoding variational bayes, arXiv:1312.6114. DOI: 10.48550/arXiv.1312.6114
Lucas, J., Tucker, G., Grosse, R. B., and Norouzi, M. (2019), Don't blame the Elbo! a linear Vae perspective on posterior collapse. Advances in Neural Information Processing Systems, 32, 9408-9418. DOI: 10.48550/arXiv.1911.02469
Yu, R. (2020), A Tutorial on VAEs: From Bayes' Rule to Lossless Compression., arXiv:200 6.10273. DOI: 10.48550/arXiv.2006.10273
Xu, M., Quiroz, M., Kohn, R., and Sisson, S-A. (2019), Variance reduction properties of the reparameterization trick, 22nd International Conference on Artificial Intelligence and Statistics, 2711-2720. DOI: 10.48550/arXiv.1809.10330
Rezende, D. J., Mohamed, S., and Wierstra, D. (2014), Stochastic backpropagation and variational inference in deep latent gaussian models, Int. Conf. on Machine Learning, 2(2). DOI: 10.48550/ arXiv.1401.4082
Ovinnikov, I. (2019), Poincar＼'e Wasserstein Autoencoder, DOI: 10.48550/arXiv.1901.01427
Garcia-Ordas, M. T., Benavides, C., Benitez-Andrades, J. A., Alaiz-Moreton, H., and Garcia-Rodriguez, I. (2021), Diabetes detection using deep learning techniques with oversampling and feature augmentation, Computer Methods and Programs in Biomedicine, 202, 105968. DOI: 10.1016/j.cmpb.2021.105968
Moreno-Barea, F. J., Jerez, J. M., and Franco, L. (2020), Improving classification accuracy using data augmentation on small data sets. Expert Systems with Applications, 161, 113696. DOI: 10.1016/j.eswa.2020.113696
Garcia-Ordas, M. T., Benitez-Andrades, J. A., Garcia-Rodriguez, I., Benavides, C., and Alaiz-Moreton, H. (2020), Detecting respiratory pathologies using convolutional neural networks and variational autoencoders for unbalancing data, Sensors, 20(4), 1214. DOI: 10.3390/s20041214
Mirheidari, B., Pan, Y., Blackburn, D., O'Malley, R., Walker, T., Venneri, A., Reuber, M., and Christensen, H. (2020), Data augmentation using generative networks to identify dementia, arXiv:2004.05989. DOI: 10.48550/arXiv.2004.05989
Ferri, C., Hernandez-Orallo, J., and Modroiu, R. (2009), An experimental comparison of performance measures for classification, Pattern Recognition Letters, 30(1), 27-38. DOI: 10.1016/j.patrec.2008.08.010
Sun, Y., Wong, A. K., and Kamel, M. S. (2009), Classification of imbalanced data: A review, International journal of pattern recognition and artificial intelligence, 23(04), 687-719. DOI: 10.1142/S0218001409007326
Branco, P., Torgo, L., and Ribeiro, R. (2015), A survey of predictive modelling under imbalanced distributions, arXiv:1505.01658. DOI: 10.48550/arXiv.1505.01658
Fan, J., Sun, C., Chen, C., Jiang, X., Liu, X., Zhao, X., Meng, L., Dai, C., and Chen, W. (2020), EEG data augmentation: towards class imbalance problem in sleep staging tasks, Journal of Neural Engineering, 17(5). DOI: 10.1088/1741-2552/abb5be
Jeni, L. A., Cohn, J. F., and De La Torre, F. (2013), Facing imbalanced data-recommendations for the use of performance metrics, Humaine Assoc. Conf. on affective computing and intelligent interaction, 245-251. DOI: 10.1109/ACII.2013.47
Chicco, D., and Jurman, G. (2020), The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation, BMC genomics, 21(1), 1-13. DOI: 10.1186-/s12864-019-6413-7 https://doi.org/10.1186-/s12864-019-6413-7

문화기술의 융합 (The Journal of the Convergence on Culture Technology)

VAE(Variational AutoEncoder) 기반 머신러닝 모델을 활용한 체중 라이프로그 이상탐지에 관한 연구

Study on Lifelog Anomaly Detection using VAE-based Machine Learning Model

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)