• Title/Summary/Keyword: Outlier detection methods

Search Result 87, Processing Time 0.02 seconds

A Study of Anomaly Detection for ICT Infrastructure using Conditional Multimodal Autoencoder (ICT 인프라 이상탐지를 위한 조건부 멀티모달 오토인코더에 관한 연구)

  • Shin, Byungjin;Lee, Jonghoon;Han, Sangjin;Park, Choong-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.57-73
    • /
    • 2021
  • Maintenance and prevention of failure through anomaly detection of ICT infrastructure is becoming important. System monitoring data is multidimensional time series data. When we deal with multidimensional time series data, we have difficulty in considering both characteristics of multidimensional data and characteristics of time series data. When dealing with multidimensional data, correlation between variables should be considered. Existing methods such as probability and linear base, distance base, etc. are degraded due to limitations called the curse of dimensions. In addition, time series data is preprocessed by applying sliding window technique and time series decomposition for self-correlation analysis. These techniques are the cause of increasing the dimension of data, so it is necessary to supplement them. The anomaly detection field is an old research field, and statistical methods and regression analysis were used in the early days. Currently, there are active studies to apply machine learning and artificial neural network technology to this field. Statistically based methods are difficult to apply when data is non-homogeneous, and do not detect local outliers well. The regression analysis method compares the predictive value and the actual value after learning the regression formula based on the parametric statistics and it detects abnormality. Anomaly detection using regression analysis has the disadvantage that the performance is lowered when the model is not solid and the noise or outliers of the data are included. There is a restriction that learning data with noise or outliers should be used. The autoencoder using artificial neural networks is learned to output as similar as possible to input data. It has many advantages compared to existing probability and linear model, cluster analysis, and map learning. It can be applied to data that does not satisfy probability distribution or linear assumption. In addition, it is possible to learn non-mapping without label data for teaching. However, there is a limitation of local outlier identification of multidimensional data in anomaly detection, and there is a problem that the dimension of data is greatly increased due to the characteristics of time series data. In this study, we propose a CMAE (Conditional Multimodal Autoencoder) that enhances the performance of anomaly detection by considering local outliers and time series characteristics. First, we applied Multimodal Autoencoder (MAE) to improve the limitations of local outlier identification of multidimensional data. Multimodals are commonly used to learn different types of inputs, such as voice and image. The different modal shares the bottleneck effect of Autoencoder and it learns correlation. In addition, CAE (Conditional Autoencoder) was used to learn the characteristics of time series data effectively without increasing the dimension of data. In general, conditional input mainly uses category variables, but in this study, time was used as a condition to learn periodicity. The CMAE model proposed in this paper was verified by comparing with the Unimodal Autoencoder (UAE) and Multi-modal Autoencoder (MAE). The restoration performance of Autoencoder for 41 variables was confirmed in the proposed model and the comparison model. The restoration performance is different by variables, and the restoration is normally well operated because the loss value is small for Memory, Disk, and Network modals in all three Autoencoder models. The process modal did not show a significant difference in all three models, and the CPU modal showed excellent performance in CMAE. ROC curve was prepared for the evaluation of anomaly detection performance in the proposed model and the comparison model, and AUC, accuracy, precision, recall, and F1-score were compared. In all indicators, the performance was shown in the order of CMAE, MAE, and AE. Especially, the reproduction rate was 0.9828 for CMAE, which can be confirmed to detect almost most of the abnormalities. The accuracy of the model was also improved and 87.12%, and the F1-score was 0.8883, which is considered to be suitable for anomaly detection. In practical aspect, the proposed model has an additional advantage in addition to performance improvement. The use of techniques such as time series decomposition and sliding windows has the disadvantage of managing unnecessary procedures; and their dimensional increase can cause a decrease in the computational speed in inference.The proposed model has characteristics that are easy to apply to practical tasks such as inference speed and model management.

Comparative Analysis of Anomaly Detection Models using AE and Suggestion of Criteria for Determining Outliers

  • Kang, Gun-Ha;Sohn, Jung-Mo;Sim, Gun-Wu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.8
    • /
    • pp.23-30
    • /
    • 2021
  • In this study, we present a comparative analysis of major autoencoder(AE)-based anomaly detection methods for quality determination in the manufacturing process and a new anomaly discrimination criterion. Due to the characteristics of manufacturing site, anomalous instances are few and their types greatly vary. These properties degrade the performance of an AI-based anomaly detection model using the dataset for both normal and anomalous cases, and incur a lot of time and costs in obtaining additional data for performance improvement. To solve this problem, the studies on AE-based models such as AE and VAE are underway, which perform anomaly detection using only normal data. In this work, based on Convolutional AE, VAE, and Dilated VAE models, statistics on residual images, MSE, and information entropy were selected as outlier discriminant criteria to compare and analyze the performance of each model. In particular, the range value applied to the Convolutional AE model showed the best performance with AUC PRC 0.9570, F1 Score 0.8812 and AUC ROC 0.9548, accuracy 87.60%. This shows a performance improvement of an accuracy about 20%P(Percentage Point) compared to MSE, which was frequently used as a standard for determining outliers, and confirmed that model performance can be improved according to the criteria for determining outliers.

A Study of the Application of Machine Learning Methods in the Low-GloSea6 Weather Prediction Solution (Low-GloSea6 기상 예측 소프트웨어의 머신러닝 기법 적용 연구)

  • Hye-Sung Park;Ye-Rin, Cho;Dae-Yeong Shin;Eun-Ok Yun;Sung-Wook Chung
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.16 no.5
    • /
    • pp.307-314
    • /
    • 2023
  • As supercomputing and hardware technology advances, climate prediction models are improving. The Korean Meteorological Administration adopted GloSea5 from the UK Met Office and now operates an updated GloSea6 tailored to Korean weather. Universities and research institutions use Low-GloSea6 on smaller servers, improving accessibility and research efficiency. In this paper, profiling Low-GloSea6 on smaller servers identified the tri_sor_dp_dp subroutine in the tri_sor.F90 atmospheric model as a CPU-intensive hotspot. Applying linear regression, a type of machine learning, to this function showed promise. After removing outliers, the linear regression model achieved an RMSE of 2.7665e-08 and an MAE of 1.4958e-08, outperforming Lasso and ElasticNet regression methods. This suggests the potential for machine learning in optimizing identified hotspots during Low-GloSea6 execution.

An Optimization Approach for Localization of an Indoor Mobile Robot (최적화 기법을 사용한 실내 이동 로봇의 위치 인식)

  • Han, Jun Hee;Ko, Nak Yong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.26 no.4
    • /
    • pp.253-258
    • /
    • 2016
  • This paper proposes a method that utilizes optimization approach for localization of an indoor mobile robot. Bayesian filters which have been widely used for localization of a mobile robot use many control parameters to take the uncertainties in measurement and environment into account. The estimation performance depends on the selection of these parameter values. Also, the performance of the Bayesian filters deteriorate as the non-linearity of the motion and measurement increases. On the other hand, the optimization approach uses fewer control parameters and is less influenced by the non-linearity than the Bayesian methods. This paper compares the localization performance of the proposed method with the performance of the extended Kalman filter to verify the feasibility of the proposed method. Measurements of ranges from beacons of ultrasonic satellite to the robot are used for localization. Mahalanobis distance is used for detection and rejection of outlier in the measurements. The optimization method sets performance index as a function of the measured range values, and finds the optimized estimation of the location through iteration. The method can improve the localization performance and reduce the computation time in corporation with Bayesian filter which provides proper initial location for the iteration.

A Study on the Water Absorption Test of Generator Stator Windings Using Probability Distributions (여러 가지 확률분포를 이용한 발전기 고정자 권선의 흡습 시험에 관한 연구)

  • Kim, Hee-Soo;Bae, Y.C.;Kim, Hee-Jeong;Na, Myung-Hwan
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.5
    • /
    • pp.961-969
    • /
    • 2009
  • Water absorption in water-cooled generator stator windings can cause serious accidents such as insulation breakdown and it brings a generator to the unexpected sudden outage. Accordingly, it is important to diagnose the water absorption of them in the effective operation of power plant. Especially, the capacitance value which is measured for diagnosis is very small so the special diagnosis methods like stochastic theory are needed. KEPRI developed the water absorption test equipment and diagnosis technology for them. In this paper we propose that water absorption test of generator stator windings using probability distributions. The proposed diagnosis technology is applied to the real system and the results of water absorption test for stator windings are agreed to them of water leak test.

CUSUM Chart Applied to Monitoring Areal Population Mobility (누적합 관리도를 활용한 생활인구 이상치 탐색)

  • Kim, Hyoung Jun;Sohn, So Young
    • Journal of Korean Society for Quality Management
    • /
    • v.48 no.2
    • /
    • pp.241-256
    • /
    • 2020
  • Purpose: Certain places in Seoul such as Shinchon, Hongdae, and Gangnam, often suffer from sudden overflow of mobile population which can cause serious safety problems. This study suggests the application of spatial CUSUM control chart in monitoring areal population mobility data which is recently provided by Seoul metropolitan government. Methods: Monitoring series of standardized local Moran's I enables one to detect spatio-temporal out-of-control status based on the accumulation of past patterns. Moreover, we visualize such pattern map for more intuitive comprehension of the phenomenon. As a case study, we have analyzed the female mobility population aged 25 to 29 appeared in 51 Jipgyegu near Hongik university on fridays from January, 2017 to June, 2018. They are validated by exploring related articles and through local due diligence. Results: The results of the analysis provide insights in figuring out if the change of the mobility population is short-term by particular incident or long-term by spatial alteration, which allows strategic approach in constructing response system. Specific case near popular downtown near Hongik University has shown that newly opened hotels, shops of global sports brand and franchise bookstores have attracted young female population. Conclusion: We expect that the results of our study contribute to planning effective distribution of administrative resources to prepare against drastic increase in floating population. Furthermore, it can be useful in commercial area analysis and age/gender specific marketing strategy for companies.

A Comparative Study on Similarity Measure Techniques for Cross-Project Defect Prediction (교차 프로젝트 결함 예측을 위한 유사도 측정 기법 비교 연구)

  • Ryu, Duksan;Baik, Jongmoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.6
    • /
    • pp.205-220
    • /
    • 2018
  • Software defect prediction is helpful for allocating valuable project resources effectively for software quality assurance activities thanks to focusing on the identified fault-prone modules. If historical data collected within a company is sufficient, a Within-Project Defect Prediction (WPDP) can be utilized for accurate fault-prone module prediction. In case a company does not maintain historical data, it may be helpful to build a classifier towards predicting comprehensible fault prediction based on Cross-Project Defect Prediction (CPDP). Since CPDP employs different project data collected from other organization to build a classifier, the main obstacle to build an accurate classifier is that distributions between source and target projects are not similar. To address the problem, because it is crucial to identify effective similarity measure techniques to obtain high performance for CPDP, In this paper, we aim to identify them. We compare various similarity measure techniques. The effectiveness of similarity weights calculated by those similarity measure techniques are evaluated. The results are verified using the statistical significance test and the effect size test. The results show k-Nearest Neighbor (k-NN), LOcal Correlation Integral (LOCI), and Range methods are the top three performers. The experimental results show that predictive performances using the three methods are comparable to those of WPDP.