• Title/Summary/Keyword: Data anomaly detection

Search Result 400, Processing Time 0.023 seconds

Structural health monitoring data anomaly detection by transformer enhanced densely connected neural networks

  • Jun, Li;Wupeng, Chen;Gao, Fan
    • Smart Structures and Systems
    • /
    • v.30 no.6
    • /
    • pp.613-626
    • /
    • 2022
  • Guaranteeing the quality and integrity of structural health monitoring (SHM) data is very important for an effective assessment of structural condition. However, sensory system may malfunction due to sensor fault or harsh operational environment, resulting in multiple types of data anomaly existing in the measured data. Efficiently and automatically identifying anomalies from the vast amounts of measured data is significant for assessing the structural conditions and early warning for structural failure in SHM. The major challenges of current automated data anomaly detection methods are the imbalance of dataset categories. In terms of the feature of actual anomalous data, this paper proposes a data anomaly detection method based on data-level and deep learning technique for SHM of civil engineering structures. The proposed method consists of a data balancing phase to prepare a comprehensive training dataset based on data-level technique, and an anomaly detection phase based on a sophisticatedly designed network. The advanced densely connected convolutional network (DenseNet) and Transformer encoder are embedded in the specific network to facilitate extraction of both detail and global features of response data, and to establish the mapping between the highest level of abstractive features and data anomaly class. Numerical studies on a steel frame model are conducted to evaluate the performance and noise immunity of using the proposed network for data anomaly detection. The applicability of the proposed method for data anomaly classification is validated with the measured data of a practical supertall structure. The proposed method presents a remarkable performance on data anomaly detection, which reaches a 95.7% overall accuracy with practical engineering structural monitoring data, which demonstrates the effectiveness of data balancing and the robust classification capability of the proposed network.

Synthetic Data Generation and Performance Analysis for Anomaly Detection (이상 탐지를 위한 합성 데이터 생성 및 성능 분석)

  • Hwang, Ju-hyo;Jin, Kyo-hong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.19-21
    • /
    • 2022
  • Anomaly detection using self-supervised learning typically generates synthetic data to learn to classify normal and abnormal, and uses real abnormal data as test data to measure anomaly detection performance. In a study using this method to generate synthetic data similar to normal data, anomaly detection was carried out by generating synthetic data by cutting and pasting a specific patch from the original image. In this way, the degree of similarity to normal data depends on the number and size of patches, which affects anomaly detection performance. In this paper, synthetic data were generated by varying patch sizes and numbers, and then similarity and analysis with normal data were conducted using a pre-trained model, and anomaly detection performance was measured by learning the model.

  • PDF

Emerging Topic Detection Using Text Embedding and Anomaly Pattern Detection in Text Streaming Data (텍스트 스트리밍 데이터에서 텍스트 임베딩과 이상 패턴 탐지를 이용한 신규 주제 발생 탐지)

  • Choi, Semok;Park, Cheong Hee
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.9
    • /
    • pp.1181-1190
    • /
    • 2020
  • Detection of an anomaly pattern deviating normal data distribution in streaming data is an important technique in many application areas. In this paper, a method for detection of an newly emerging pattern in text streaming data which is an ordered sequence of texts is proposed based on text embedding and anomaly pattern detection. Using text embedding methods such as BOW(Bag Of Words), Word2Vec, and BERT, the detection performance of the proposed method is compared. Experimental results show that anomaly pattern detection using BERT embedding gave an average F1 value of 0.85 and the F1 value of 1 in three cases among five test cases.

Design of Anomaly Detection System Based on Big Data in Internet of Things (빅데이터 기반의 IoT 이상 장애 탐지 시스템 설계)

  • Na, Sung Il;Kim, Hyoung Joong
    • Journal of Digital Contents Society
    • /
    • v.19 no.2
    • /
    • pp.377-383
    • /
    • 2018
  • Internet of Things (IoT) is producing various data as the smart environment comes. The IoT data collection is used as important data to judge systems's status. Therefore, it is important to monitor the anomaly state of the sensor in real-time and to detect anomaly data. However, it is necessary to convert the IoT data into a normalized data structure for anomaly detection because of the variety of data structures and protocols. Thus, we can expect a good quality effect such as accurate analysis data quality and service quality. In this paper, we propose an anomaly detection system based on big data from collected sensor data. The proposed system is applied to ensure anomaly detection and keep data quality. In addition, we applied the machine learning model of support vector machine using anomaly detection based on time-series data. As a result, machine learning using preprocessed data was able to accurately detect and predict anomaly.

CutPaste-Based Anomaly Detection Model using Multi Scale Feature Extraction in Time Series Streaming Data

  • Jeon, Byeong-Uk;Chung, Kyungyong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.8
    • /
    • pp.2787-2800
    • /
    • 2022
  • The aging society increases emergency situations of the elderly living alone and a variety of social crimes. In order to prevent them, techniques to detect emergency situations through voice are actively researched. This study proposes CutPaste-based anomaly detection model using multi-scale feature extraction in time series streaming data. In the proposed method, an audio file is converted into a spectrogram. In this way, it is possible to use an algorithm for image data, such as CNN. After that, mutli-scale feature extraction is applied. Three images drawn from Adaptive Pooling layer that has different-sized kernels are merged. In consideration of various types of anomaly, including point anomaly, contextual anomaly, and collective anomaly, the limitations of a conventional anomaly model are improved. Finally, CutPaste-based anomaly detection is conducted. Since the model is trained through self-supervised learning, it is possible to detect a diversity of emergency situations as anomaly without labeling. Therefore, the proposed model overcomes the limitations of a conventional model that classifies only labelled emergency situations. Also, the proposed model is evaluated to have better performance than a conventional anomaly detection model.

Normal data based rotating machine anomaly detection using CNN with self-labeling

  • Bae, Jaewoong;Jung, Wonho;Park, Yong-Hwa
    • Smart Structures and Systems
    • /
    • v.29 no.6
    • /
    • pp.757-766
    • /
    • 2022
  • To train deep learning algorithms, a sufficient number of data are required. However, in most engineering systems, the acquisition of fault data is difficult or sometimes not feasible, while normal data are secured. The dearth of data is one of the major challenges to developing deep learning models, and fault diagnosis in particular cannot be made in the absence of fault data. With this context, this paper proposes an anomaly detection methodology for rotating machines using only normal data with self-labeling. Since only normal data are used for anomaly detection, a self-labeling method is used to generate a new labeled dataset. The overall procedure includes the following three steps: (1) transformation of normal data to self-labeled data based on a pretext task, (2) training the convolutional neural networks (CNN), and (3) anomaly detection using defined anomaly score based on the softmax output of the trained CNN. The softmax value of the abnormal sample shows different behavior from the normal softmax values. To verify the proposed method, four case studies were conducted, on the Case Western Reserve University (CWRU) bearing dataset, IEEE PHM 2012 data challenge dataset, PHMAP 2021 data challenge dataset, and laboratory bearing testbed; and the results were compared to those of existing machine learning and deep learning methods. The results showed that the proposed algorithm could detect faults in the bearing testbed and compressor with over 99.7% accuracy. In particular, it was possible to detect not only bearing faults but also structural faults such as unbalance and belt looseness with very high accuracy. Compared with the existing GAN, the autoencoder-based anomaly detection algorithm, the proposed method showed high anomaly detection performance.

TCN-USAD for Anomaly Power Detection (이상 전력 탐지를 위한 TCN-USAD)

  • Hyeonseok Jin;Kyungbaek Kim
    • Smart Media Journal
    • /
    • v.13 no.7
    • /
    • pp.9-17
    • /
    • 2024
  • Due to the increase in energy consumption, and eco-friendly policies, there is a need for efficient energy consumption in buildings. Anomaly power detection based on deep learning are being used. Because of the difficulty in collecting anomaly data, anomaly detection is performed using reconstruction error with a Recurrent Neural Network(RNN) based autoencoder. However, there are some limitations such as the long time required to fully learn temporal features and its sensitivity to noise in the train data. To overcome these limitations, this paper proposes the TCN-USAD, combined with Temporal Convolution Network(TCN) and UnSupervised Anomaly Detection for multivariate data(USAD). The proposed model using TCN-based autoencoder and the USAD structure, which uses two decoders and adversarial training, to quickly learn temporal features and enable robust anomaly detection. To validate the performance of TCN-USAD, comparative experiments were performed using two building energy datasets. The results showed that the TCN-based autoencoder can perform faster and better reconstruction than RNN-based autoencoder. Furthermore, TCN-USAD achieved 20% improved F1-Score over other anomaly detection models, demonstrating excellent anomaly detection performance.

Anomaly Detection of Big Time Series Data Using Machine Learning (머신러닝 기법을 활용한 대용량 시계열 데이터 이상 시점탐지 방법론 : 발전기 부품신호 사례 중심)

  • Kwon, Sehyug
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.43 no.2
    • /
    • pp.33-38
    • /
    • 2020
  • Anomaly detection of Machine Learning such as PCA anomaly detection and CNN image classification has been focused on cross-sectional data. In this paper, two approaches has been suggested to apply ML techniques for identifying the failure time of big time series data. PCA anomaly detection to identify time rows as normal or abnormal was suggested by converting subjects identification problem to time domain. CNN image classification was suggested to identify the failure time by re-structuring of time series data, which computed the correlation matrix of one minute data and converted to tiff image format. Also, LASSO, one of feature selection methods, was applied to select the most affecting variables which could identify the failure status. For the empirical study, time series data was collected in seconds from a power generator of 214 components for 25 minutes including 20 minutes before the failure time. The failure time was predicted and detected 9 minutes 17 seconds before the failure time by PCA anomaly detection, but was not detected by the combination of LASSO and PCA because the target variable was binary variable which was assigned on the base of the failure time. CNN image classification with the train data of 10 normal status image and 5 failure status images detected just one minute before.

Semi-Supervised Learning Based Anomaly Detection for License Plate OCR in Real Time Video

  • Kim, Bada;Heo, Junyoung
    • International journal of advanced smart convergence
    • /
    • v.9 no.1
    • /
    • pp.113-120
    • /
    • 2020
  • Recently, the license plate OCR system has been commercialized in a variety of fields and preferred utilizing low-cost embedded systems using only cameras. This system has a high recognition rate of about 98% or more for the environments such as parking lots where non-vehicle is restricted; however, the environments where non-vehicle objects are not restricted, the recognition rate is about 50% to 70%. This low performance is due to the changes in the environment by non-vehicle objects in real-time situations that occur anomaly data which is similar to the license plates. In this paper, we implement the appropriate anomaly detection based on semi-supervised learning for the license plate OCR system in the real-time environment where the appearance of non-vehicle objects is not restricted. In the experiment, we compare systems which anomaly detection is not implemented in the preceding research with the proposed system in this paper. As a result, the systems which anomaly detection is not implemented had a recognition rate of 77%; however, the systems with the semi-supervised learning based on anomaly detection had 88% of recognition rate. Using the techniques of anomaly detection based on the semi-supervised learning was effective in detecting anomaly data and it was helpful to improve the recognition rate of real-time situations.

Performance Comparison of Anomaly Detection Algorithms: in terms of Anomaly Type and Data Properties (이상탐지 알고리즘 성능 비교: 이상치 유형과 데이터 속성 관점에서)

  • Jaeung Kim;Seung Ryul Jeong;Namgyu Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.229-247
    • /
    • 2023
  • With the increasing emphasis on anomaly detection across various fields, diverse anomaly detection algorithms have been developed for various data types and anomaly patterns. However, the performance of anomaly detection algorithms is generally evaluated on publicly available datasets, and the specific performance of each algorithm on anomalies of particular types remains unexplored. Consequently, selecting an appropriate anomaly detection algorithm for specific analytical contexts poses challenges. Therefore, in this paper, we aim to investigate the types of anomalies and various attributes of data. Subsequently, we intend to propose approaches that can assist in the selection of appropriate anomaly detection algorithms based on this understanding. Specifically, this study compares the performance of anomaly detection algorithms for four types of anomalies: local, global, contextual, and clustered anomalies. Through further analysis, the impact of label availability, data quantity, and dimensionality on algorithm performance is examined. Experimental results demonstrate that the most effective algorithm varies depending on the type of anomaly, and certain algorithms exhibit stable performance even in the absence of anomaly-specific information. Furthermore, in some types of anomalies, the performance of unsupervised anomaly detection algorithms was observed to be lower than that of supervised and semi-supervised learning algorithms. Lastly, we found that the performance of most algorithms is more strongly influenced by the type of anomalies when the data quantity is relatively scarce or abundant. Additionally, in cases of higher dimensionality, it was noted that excellent performance was exhibited in detecting local and global anomalies, while lower performance was observed for clustered anomaly types.