• 제목/요약/키워드: data anomaly classification

검색결과 96건 처리시간 0.029초

머신러닝 기법을 활용한 대용량 시계열 데이터 이상 시점탐지 방법론 : 발전기 부품신호 사례 중심 (Anomaly Detection of Big Time Series Data Using Machine Learning)

  • 권세혁
    • 산업경영시스템학회지
    • /
    • 제43권2호
    • /
    • pp.33-38
    • /
    • 2020
  • Anomaly detection of Machine Learning such as PCA anomaly detection and CNN image classification has been focused on cross-sectional data. In this paper, two approaches has been suggested to apply ML techniques for identifying the failure time of big time series data. PCA anomaly detection to identify time rows as normal or abnormal was suggested by converting subjects identification problem to time domain. CNN image classification was suggested to identify the failure time by re-structuring of time series data, which computed the correlation matrix of one minute data and converted to tiff image format. Also, LASSO, one of feature selection methods, was applied to select the most affecting variables which could identify the failure status. For the empirical study, time series data was collected in seconds from a power generator of 214 components for 25 minutes including 20 minutes before the failure time. The failure time was predicted and detected 9 minutes 17 seconds before the failure time by PCA anomaly detection, but was not detected by the combination of LASSO and PCA because the target variable was binary variable which was assigned on the base of the failure time. CNN image classification with the train data of 10 normal status image and 5 failure status images detected just one minute before.

AN ANOMALY DETECTION METHOD BY ASSOCIATIVE CLASSIFICATION

  • Lee, Bum-Ju;Lee, Heon-Gyu;Ryu, Keun-Ho
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2005년도 Proceedings of ISRS 2005
    • /
    • pp.301-304
    • /
    • 2005
  • For detecting an intrusion based on the anomaly of a user's activities, previous works are concentrated on statistical techniques or frequent episode mining in order to analyze an audit data. But, since they mainly analyze the average behaviour of user's activities, some anomalies can be detected inaccurately. Therefore, we propose an anomaly detection method that utilizes an associative classification for modelling intrusion detection. Finally, we proof that a prediction model built from associative classification method yields better accuracy than a prediction model built from a traditional methods by experimental results.

  • PDF

SHM data anomaly classification using machine learning strategies: A comparative study

  • Chou, Jau-Yu;Fu, Yuguang;Huang, Shieh-Kung;Chang, Chia-Ming
    • Smart Structures and Systems
    • /
    • 제29권1호
    • /
    • pp.77-91
    • /
    • 2022
  • Various monitoring systems have been implemented in civil infrastructure to ensure structural safety and integrity. In long-term monitoring, these systems generate a large amount of data, where anomalies are not unusual and can pose unique challenges for structural health monitoring applications, such as system identification and damage detection. Therefore, developing efficient techniques is quite essential to recognize the anomalies in monitoring data. In this study, several machine learning techniques are explored and implemented to detect and classify various types of data anomalies. A field dataset, which consists of one month long acceleration data obtained from a long-span cable-stayed bridge in China, is employed to examine the machine learning techniques for automated data anomaly detection. These techniques include the statistic-based pattern recognition network, spectrogram-based convolutional neural network, image-based time history convolutional neural network, image-based time-frequency hybrid convolution neural network (GoogLeNet), and proposed ensemble neural network model. The ensemble model deliberately combines different machine learning models to enhance anomaly classification performance. The results show that all these techniques can successfully detect and classify six types of data anomalies (i.e., missing, minor, outlier, square, trend, drift). Moreover, both image-based time history convolutional neural network and GoogLeNet are further investigated for the capability of autonomous online anomaly classification and found to effectively classify anomalies with decent performance. As seen in comparison with accuracy, the proposed ensemble neural network model outperforms the other three machine learning techniques. This study also evaluates the proposed ensemble neural network model to a blind test dataset. As found in the results, this ensemble model is effective for data anomaly detection and applicable for the signal characteristics changing over time.

Structural health monitoring data anomaly detection by transformer enhanced densely connected neural networks

  • Jun, Li;Wupeng, Chen;Gao, Fan
    • Smart Structures and Systems
    • /
    • 제30권6호
    • /
    • pp.613-626
    • /
    • 2022
  • Guaranteeing the quality and integrity of structural health monitoring (SHM) data is very important for an effective assessment of structural condition. However, sensory system may malfunction due to sensor fault or harsh operational environment, resulting in multiple types of data anomaly existing in the measured data. Efficiently and automatically identifying anomalies from the vast amounts of measured data is significant for assessing the structural conditions and early warning for structural failure in SHM. The major challenges of current automated data anomaly detection methods are the imbalance of dataset categories. In terms of the feature of actual anomalous data, this paper proposes a data anomaly detection method based on data-level and deep learning technique for SHM of civil engineering structures. The proposed method consists of a data balancing phase to prepare a comprehensive training dataset based on data-level technique, and an anomaly detection phase based on a sophisticatedly designed network. The advanced densely connected convolutional network (DenseNet) and Transformer encoder are embedded in the specific network to facilitate extraction of both detail and global features of response data, and to establish the mapping between the highest level of abstractive features and data anomaly class. Numerical studies on a steel frame model are conducted to evaluate the performance and noise immunity of using the proposed network for data anomaly detection. The applicability of the proposed method for data anomaly classification is validated with the measured data of a practical supertall structure. The proposed method presents a remarkable performance on data anomaly detection, which reaches a 95.7% overall accuracy with practical engineering structural monitoring data, which demonstrates the effectiveness of data balancing and the robust classification capability of the proposed network.

An Anomaly Detection Framework Based on ICA and Bayesian Classification for IaaS Platforms

  • Wang, GuiPing;Yang, JianXi;Li, Ren
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제10권8호
    • /
    • pp.3865-3883
    • /
    • 2016
  • Infrastructure as a Service (IaaS) encapsulates computer hardware into a large amount of virtual and manageable instances mainly in the form of virtual machine (VM), and provides rental service for users. Currently, VM anomaly incidents occasionally occur, which leads to performance issues and even downtime. This paper aims at detecting anomalous VMs based on performance metrics data of VMs. Due to the dynamic nature and increasing scale of IaaS, detecting anomalous VMs from voluminous correlated and non-Gaussian monitored performance data is a challenging task. This paper designs an anomaly detection framework to solve this challenge. First, it collects 53 performance metrics to reflect the running state of each VM. The collected performance metrics are testified not to follow the Gaussian distribution. Then, it employs independent components analysis (ICA) instead of principal component analysis (PCA) to extract independent components from collected non-Gaussian performance metric data. For anomaly detection, it employs multi-class Bayesian classification to determine the current state of each VM. To evaluate the performance of the designed detection framework, four types of anomalies are separately or jointly injected into randomly selected VMs in a campus-wide testbed. The experimental results show that ICA-based detection mechanism outperforms PCA-based and LDA-based detection mechanisms in terms of sensitivity and specificity.

LIME을 활용한 준지도 학습 기반 이상 탐지 모델: 반도체 공정을 중심으로 (Anomaly Detection Model Based on Semi-Supervised Learning Using LIME: Focusing on Semiconductor Process)

  • 안강민;신주은;백동현
    • 산업경영시스템학회지
    • /
    • 제45권4호
    • /
    • pp.86-98
    • /
    • 2022
  • Recently, many studies have been conducted to improve quality by applying machine learning models to semiconductor manufacturing process data. However, in the semiconductor manufacturing process, the ratio of good products is much higher than that of defective products, so the problem of data imbalance is serious in terms of machine learning. In addition, since the number of features of data used in machine learning is very large, it is very important to perform machine learning by extracting only important features from among them to increase accuracy and utilization. This study proposes an anomaly detection methodology that can learn excellently despite data imbalance and high-dimensional characteristics of semiconductor process data. The anomaly detection methodology applies the LIME algorithm after applying the SMOTE method and the RFECV method. The proposed methodology analyzes the classification result of the anomaly classification model, detects the cause of the anomaly, and derives a semiconductor process requiring action. The proposed methodology confirmed applicability and feasibility through application of cases.

네트워크 비정상 탐지를 위한 속성 축소를 반영한 의사결정나무 기술 (Decision Tree Techniques with Feature Reduction for Network Anomaly Detection)

  • 강구홍
    • 정보보호학회논문지
    • /
    • 제29권4호
    • /
    • pp.795-805
    • /
    • 2019
  • 최근 알려지지 않은 공격에 대처하기 위한 네트워크 비정상(anomaly) 탐지 기술에 대한 관심이 한층 높아지고 있다. 이러한 기술 개발을 위해 데이터 마이닝(data mining), 기계학습(machine learning), 그리고 딥러닝(deep learning)등을 활용한 다양한 연구가 진행되고 있다. 본 논문에서는 분류(classification) 문제를 다루는 데이터 마이닝 기술 중 가장 전통적인 방법 중 하나인 의사결정나무(decision tree)를 이용하여 NSL-KDD 데이터 셋을 대상으로 네트워크 비정상 탐지 가능성을 보여준다. 의사결정나무의 과대적합(over-fitting) 단점을 해소하기 위해 카이-제곱(chi-square) 테스트를 통해 최적의 속성 선택(feature selection)을 수행하고, 선택된 13개의 속성을 사용한 의사결정나무 모델 환경에서 NSL-KDD 시험 데이터 셋 KDDTest+에 대해 84% 그리고 KDDTest-21에 대해 70%의 네트워크 비정상 검출 정확도를 보였다. 제시된 정확도는 기존 의사결정나무 모델 적용 시 이들 시험 데이터 셋을 대상으로 알려진 정확도 81% 그리고 64% 수준과 비교해 약 3% 그리고 6% 각각 향상된 결과다.

Cluster-based Deep One-Class Classification Model for Anomaly Detection

  • Younghwan Kim;Huy Kang Kim
    • Journal of Internet Technology
    • /
    • 제22권4호
    • /
    • pp.903-911
    • /
    • 2021
  • As cyber-attacks on Cyber-Physical System (CPS) become more diverse and sophisticated, it is important to quickly detect malicious behaviors occurring in CPS. Since CPS can collect sensor data in near real time throughout the process, there have been many attempts to detect anomaly behavior through normal behavior learning from the perspective of data-driven security. However, since the CPS datasets are big data and most of the data are normal data, it has always been a great challenge to analyze the data and implement the anomaly detection model. In this paper, we propose and evaluate the Clustered Deep One-Class Classification (CD-OCC) model that combines the clustering algorithm and deep learning (DL) model using only a normal dataset for anomaly detection. We use auto-encoder to reduce the dimensions of the dataset and the K-means clustering algorithm to classify the normal data into the optimal cluster size. The DL model trains to predict clusters of normal data, and we can obtain logit values as outputs. The derived logit values are datasets that can better represent normal data in terms of knowledge distillation and are used as inputs to the OCC model. As a result of the experiment, the F1 score of the proposed model shows 0.93 and 0.83 in the SWaT and HAI dataset, respectively, and shows a significant performance improvement over other recent detectors such as Com-AE and SVM-RBF.

검증데이터 기반의 차별화된 이상데이터 처리를 통한 데이터 불균형 해소 방법 (Resolving data imbalance through differentiated anomaly data processing based on verification data)

  • 황철현
    • 지능정보연구
    • /
    • 제28권4호
    • /
    • pp.179-190
    • /
    • 2022
  • 데이터 불균형은 한 분류의 데이터 수가 다른 분류에 비해 지나치게 크거나 작은 현상을 의미하며. 이로 인해 분류 알고리즘을 활용하는 기계학습에서 성능을 저하시키는 주요 요인으로 제기되고 있다. 데이터 불균형 문제 해결을 위해서 소수 분포 데이터를 증폭하는 다양한 오버 샘플링(Over Sampling) 방법들이 제안되고 있다. 이 가운데 SMOTE는 가장 대표적인 방법으로 소수 분포 데이터의 증폭 효과를 극대화하기 위해 데이터에 포함된 잡음을 제거(SMOTE-IPF)하거나, 경계선만을 강화(Borderline SMOTE) 시키는 다양한 방법들이 출현하였다. 이 논문은 소수분류 데이터를 증폭하는 전통적인 SMOTE 방법에서 이상데이터(Anomaly Data)에 대한 처리방법개선을 통해 궁극적으로 분류성능을 높이는 방법을 제안한다. 제안 방법은 실험을 통해 기존 방법에 비해 상대적으로 높은 분류성능을 일관성 있게 제시하였다.

Convolutional neural network-based data anomaly detection considering class imbalance with limited data

  • Du, Yao;Li, Ling-fang;Hou, Rong-rong;Wang, Xiao-you;Tian, Wei;Xia, Yong
    • Smart Structures and Systems
    • /
    • 제29권1호
    • /
    • pp.63-75
    • /
    • 2022
  • The raw data collected by structural health monitoring (SHM) systems may suffer multiple patterns of anomalies, which pose a significant barrier for an automatic and accurate structural condition assessment. Therefore, the detection and classification of these anomalies is an essential pre-processing step for SHM systems. However, the heterogeneous data patterns, scarce anomalous samples and severe class imbalance make data anomaly detection difficult. In this regard, this study proposes a convolutional neural network-based data anomaly detection method. The time and frequency domains data are transferred as images and used as the input of the neural network for training. ResNet18 is adopted as the feature extractor to avoid training with massive labelled data. In addition, the focal loss function is adopted to soften the class imbalance-induced classification bias. The effectiveness of the proposed method is validated using acceleration data collected in a long-span cable-stayed bridge. The proposed approach detects and classifies data anomalies with high accuracy.