• Title/Summary/Keyword: data anomaly classification

Search Result 88, Processing Time 0.023 seconds

Anomaly Detection of Big Time Series Data Using Machine Learning (머신러닝 기법을 활용한 대용량 시계열 데이터 이상 시점탐지 방법론 : 발전기 부품신호 사례 중심)

  • Kwon, Sehyug
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.43 no.2
    • /
    • pp.33-38
    • /
    • 2020
  • Anomaly detection of Machine Learning such as PCA anomaly detection and CNN image classification has been focused on cross-sectional data. In this paper, two approaches has been suggested to apply ML techniques for identifying the failure time of big time series data. PCA anomaly detection to identify time rows as normal or abnormal was suggested by converting subjects identification problem to time domain. CNN image classification was suggested to identify the failure time by re-structuring of time series data, which computed the correlation matrix of one minute data and converted to tiff image format. Also, LASSO, one of feature selection methods, was applied to select the most affecting variables which could identify the failure status. For the empirical study, time series data was collected in seconds from a power generator of 214 components for 25 minutes including 20 minutes before the failure time. The failure time was predicted and detected 9 minutes 17 seconds before the failure time by PCA anomaly detection, but was not detected by the combination of LASSO and PCA because the target variable was binary variable which was assigned on the base of the failure time. CNN image classification with the train data of 10 normal status image and 5 failure status images detected just one minute before.

AN ANOMALY DETECTION METHOD BY ASSOCIATIVE CLASSIFICATION

  • Lee, Bum-Ju;Lee, Heon-Gyu;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • 2005.10a
    • /
    • pp.301-304
    • /
    • 2005
  • For detecting an intrusion based on the anomaly of a user's activities, previous works are concentrated on statistical techniques or frequent episode mining in order to analyze an audit data. But, since they mainly analyze the average behaviour of user's activities, some anomalies can be detected inaccurately. Therefore, we propose an anomaly detection method that utilizes an associative classification for modelling intrusion detection. Finally, we proof that a prediction model built from associative classification method yields better accuracy than a prediction model built from a traditional methods by experimental results.

  • PDF

SHM data anomaly classification using machine learning strategies: A comparative study

  • Chou, Jau-Yu;Fu, Yuguang;Huang, Shieh-Kung;Chang, Chia-Ming
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.77-91
    • /
    • 2022
  • Various monitoring systems have been implemented in civil infrastructure to ensure structural safety and integrity. In long-term monitoring, these systems generate a large amount of data, where anomalies are not unusual and can pose unique challenges for structural health monitoring applications, such as system identification and damage detection. Therefore, developing efficient techniques is quite essential to recognize the anomalies in monitoring data. In this study, several machine learning techniques are explored and implemented to detect and classify various types of data anomalies. A field dataset, which consists of one month long acceleration data obtained from a long-span cable-stayed bridge in China, is employed to examine the machine learning techniques for automated data anomaly detection. These techniques include the statistic-based pattern recognition network, spectrogram-based convolutional neural network, image-based time history convolutional neural network, image-based time-frequency hybrid convolution neural network (GoogLeNet), and proposed ensemble neural network model. The ensemble model deliberately combines different machine learning models to enhance anomaly classification performance. The results show that all these techniques can successfully detect and classify six types of data anomalies (i.e., missing, minor, outlier, square, trend, drift). Moreover, both image-based time history convolutional neural network and GoogLeNet are further investigated for the capability of autonomous online anomaly classification and found to effectively classify anomalies with decent performance. As seen in comparison with accuracy, the proposed ensemble neural network model outperforms the other three machine learning techniques. This study also evaluates the proposed ensemble neural network model to a blind test dataset. As found in the results, this ensemble model is effective for data anomaly detection and applicable for the signal characteristics changing over time.

Structural health monitoring data anomaly detection by transformer enhanced densely connected neural networks

  • Jun, Li;Wupeng, Chen;Gao, Fan
    • Smart Structures and Systems
    • /
    • v.30 no.6
    • /
    • pp.613-626
    • /
    • 2022
  • Guaranteeing the quality and integrity of structural health monitoring (SHM) data is very important for an effective assessment of structural condition. However, sensory system may malfunction due to sensor fault or harsh operational environment, resulting in multiple types of data anomaly existing in the measured data. Efficiently and automatically identifying anomalies from the vast amounts of measured data is significant for assessing the structural conditions and early warning for structural failure in SHM. The major challenges of current automated data anomaly detection methods are the imbalance of dataset categories. In terms of the feature of actual anomalous data, this paper proposes a data anomaly detection method based on data-level and deep learning technique for SHM of civil engineering structures. The proposed method consists of a data balancing phase to prepare a comprehensive training dataset based on data-level technique, and an anomaly detection phase based on a sophisticatedly designed network. The advanced densely connected convolutional network (DenseNet) and Transformer encoder are embedded in the specific network to facilitate extraction of both detail and global features of response data, and to establish the mapping between the highest level of abstractive features and data anomaly class. Numerical studies on a steel frame model are conducted to evaluate the performance and noise immunity of using the proposed network for data anomaly detection. The applicability of the proposed method for data anomaly classification is validated with the measured data of a practical supertall structure. The proposed method presents a remarkable performance on data anomaly detection, which reaches a 95.7% overall accuracy with practical engineering structural monitoring data, which demonstrates the effectiveness of data balancing and the robust classification capability of the proposed network.

An Anomaly Detection Framework Based on ICA and Bayesian Classification for IaaS Platforms

  • Wang, GuiPing;Yang, JianXi;Li, Ren
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.8
    • /
    • pp.3865-3883
    • /
    • 2016
  • Infrastructure as a Service (IaaS) encapsulates computer hardware into a large amount of virtual and manageable instances mainly in the form of virtual machine (VM), and provides rental service for users. Currently, VM anomaly incidents occasionally occur, which leads to performance issues and even downtime. This paper aims at detecting anomalous VMs based on performance metrics data of VMs. Due to the dynamic nature and increasing scale of IaaS, detecting anomalous VMs from voluminous correlated and non-Gaussian monitored performance data is a challenging task. This paper designs an anomaly detection framework to solve this challenge. First, it collects 53 performance metrics to reflect the running state of each VM. The collected performance metrics are testified not to follow the Gaussian distribution. Then, it employs independent components analysis (ICA) instead of principal component analysis (PCA) to extract independent components from collected non-Gaussian performance metric data. For anomaly detection, it employs multi-class Bayesian classification to determine the current state of each VM. To evaluate the performance of the designed detection framework, four types of anomalies are separately or jointly injected into randomly selected VMs in a campus-wide testbed. The experimental results show that ICA-based detection mechanism outperforms PCA-based and LDA-based detection mechanisms in terms of sensitivity and specificity.

Anomaly Detection Model Based on Semi-Supervised Learning Using LIME: Focusing on Semiconductor Process (LIME을 활용한 준지도 학습 기반 이상 탐지 모델: 반도체 공정을 중심으로)

  • Kang-Min An;Ju-Eun Shin;Dong Hyun Baek
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.45 no.4
    • /
    • pp.86-98
    • /
    • 2022
  • Recently, many studies have been conducted to improve quality by applying machine learning models to semiconductor manufacturing process data. However, in the semiconductor manufacturing process, the ratio of good products is much higher than that of defective products, so the problem of data imbalance is serious in terms of machine learning. In addition, since the number of features of data used in machine learning is very large, it is very important to perform machine learning by extracting only important features from among them to increase accuracy and utilization. This study proposes an anomaly detection methodology that can learn excellently despite data imbalance and high-dimensional characteristics of semiconductor process data. The anomaly detection methodology applies the LIME algorithm after applying the SMOTE method and the RFECV method. The proposed methodology analyzes the classification result of the anomaly classification model, detects the cause of the anomaly, and derives a semiconductor process requiring action. The proposed methodology confirmed applicability and feasibility through application of cases.

Decision Tree Techniques with Feature Reduction for Network Anomaly Detection (네트워크 비정상 탐지를 위한 속성 축소를 반영한 의사결정나무 기술)

  • Kang, Koohong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.29 no.4
    • /
    • pp.795-805
    • /
    • 2019
  • Recently, there is a growing interest in network anomaly detection technology to tackle unknown attacks. For this purpose, diverse studies using data mining, machine learning, and deep learning have been applied to detect network anomalies. In this paper, we evaluate the decision tree to see its feasibility for network anomaly detection on NSL-KDD data set, which is one of the most popular data mining techniques for classification. In order to handle the over-fitting problem of decision tree, we select 13 features from the original 41 features of the data set using chi-square test, and then model the decision tree using TensorFlow and Scik-Learn, yielding 84% and 70% of binary classification accuracies on the KDDTest+ and KDDTest-21 of NSL-KDD test data set. This result shows 3% and 6% improvements compared to the previous 81% and 64% of binary classification accuracies by decision tree technologies, respectively.

Resolving data imbalance through differentiated anomaly data processing based on verification data (검증데이터 기반의 차별화된 이상데이터 처리를 통한 데이터 불균형 해소 방법)

  • Hwang, Chulhyun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.179-190
    • /
    • 2022
  • Data imbalance refers to a phenomenon in which the number of data in one category is too large or too small compared to another category. Due to this, it has been raised as a major factor that deteriorates performance in machine learning that utilizes classification algorithms. In order to solve the data imbalance problem, various ovrsampling methods for amplifying prime number distribution data have been proposed. Among them, SMOTE is the most representative method. In order to maximize the amplification effect of minority distribution data, various methods have emerged that remove noise included in data (SMOTE-IPF) or enhance only border lines (Borderline SMOTE). This paper proposes a method to ultimately improve classification performance by improving the processing method for anomaly data in the traditional SMOTE method that amplifies minority classification data. The proposed method consistently presented relatively high classification performance compared to the existing methods through experiments.

Convolutional neural network-based data anomaly detection considering class imbalance with limited data

  • Du, Yao;Li, Ling-fang;Hou, Rong-rong;Wang, Xiao-you;Tian, Wei;Xia, Yong
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.63-75
    • /
    • 2022
  • The raw data collected by structural health monitoring (SHM) systems may suffer multiple patterns of anomalies, which pose a significant barrier for an automatic and accurate structural condition assessment. Therefore, the detection and classification of these anomalies is an essential pre-processing step for SHM systems. However, the heterogeneous data patterns, scarce anomalous samples and severe class imbalance make data anomaly detection difficult. In this regard, this study proposes a convolutional neural network-based data anomaly detection method. The time and frequency domains data are transferred as images and used as the input of the neural network for training. ResNet18 is adopted as the feature extractor to avoid training with massive labelled data. In addition, the focal loss function is adopted to soften the class imbalance-induced classification bias. The effectiveness of the proposed method is validated using acceleration data collected in a long-span cable-stayed bridge. The proposed approach detects and classifies data anomalies with high accuracy.

Development of an Anomaly Detection Algorithm for Verification of Radionuclide Analysis Based on Artificial Intelligence in Radioactive Wastes (방사성폐기물 핵종분석 검증용 이상 탐지를 위한 인공지능 기반 알고리즘 개발)

  • Seungsoo Jang;Jang Hee Lee;Young-su Kim;Jiseok Kim;Jeen-hyeng Kwon;Song Hyun Kim
    • Journal of Radiation Industry
    • /
    • v.17 no.1
    • /
    • pp.19-32
    • /
    • 2023
  • The amount of radioactive waste is expected to dramatically increase with decommissioning of nuclear power plants such as Kori-1, the first nuclear power plant in South Korea. Accurate nuclide analysis is necessary to manage the radioactive wastes safely, but research on verification of radionuclide analysis has yet to be well established. This study aimed to develop the technology that can verify the results of radionuclide analysis based on artificial intelligence. In this study, we propose an anomaly detection algorithm for inspecting the analysis error of radionuclide. We used the data from 'Updated Scaling Factors in Low-Level Radwaste' (NP-5077) published by EPRI (Electric Power Research Institute), and resampling was performed using SMOTE (Synthetic Minority Oversampling Technique) algorithm to augment data. 149,676 augmented data with SMOTE algorithm was used to train the artificial neural networks (classification and anomaly detection networks). 324 NP-5077 report data verified the performance of networks. The anomaly detection algorithm of radionuclide analysis was divided into two modules that detect a case where radioactive waste was incorrectly classified or discriminate an abnormal data such as loss of data or incorrectly written data. The classification network was constructed using the fully connected layer, and the anomaly detection network was composed of the encoder and decoder. The latter was operated by loading the latent vector from the end layer of the classification network. This study conducted exploratory data analysis (i.e., statistics, histogram, correlation, covariance, PCA, k-mean clustering, DBSCAN). As a result of analyzing the data, it is complicated to distinguish the type of radioactive waste because data distribution overlapped each other. In spite of these complexities, our algorithm based on deep learning can distinguish abnormal data from normal data. Radionuclide analysis was verified using our anomaly detection algorithm, and meaningful results were obtained.