• 제목/요약/키워드: Anomaly Data

검색결과 799건 처리시간 0.025초

이상탐지 알고리즘 성능 비교: 이상치 유형과 데이터 속성 관점에서 (Performance Comparison of Anomaly Detection Algorithms: in terms of Anomaly Type and Data Properties)

  • 김재웅;정승렬;김남규
    • 지능정보연구
    • /
    • 제29권3호
    • /
    • pp.229-247
    • /
    • 2023
  • 여러 분야에서 이상탐지의 중요성이 강조됨에 따라, 다양한 데이터 유형과 이상치 유형에 대한 이상탐지 알고리즘이 개발되고 있다. 하지만 이상탐지 알고리즘의 성능은 주로 공개 데이터 세트에 대해 측정될 뿐 특정 유형의 이상치에서 나타나는 각 알고리즘의 성능은 확인되지 않고 있으므로, 분석 상황에 맞는 적절한 이상탐지 알고리즘 선택에 어려움이 있다. 이에 본 논문에서는 이상치의 유형과 다양한 데이터 속성을 먼저 파악하여, 이를 기반으로 적절한 이상탐지 알고리즘 선택에 도움을 줄 수 있는 방안을 제시하고자 한다. 구체적으로 본 연구에서는 지역, 전역, 종속성, 그리고 군집화의 총 4가지 이상치 유형에 대해 이상탐지 알고리즘의 성능을 비교하고, 추가 분석을 통해 라벨 수준, 데이터 개수, 그리고 차원 수가 성능에 미치는 영향을 확인한다. 실험 결과 이상치 유형에 따라 가장 우수한 성능을 나타내는 알고리즘이 다르게 나타나며, 이상치 유형에 대한 정보가 없는 경우에도 안정적인 성능을 보여주는 알고리즘을 확인했다. 또한 비지도 학습 기반 이상탐지 알고리즘의 성능이 지도 학습 및 준지도 학습 알고리즘의 성능보다 낮게 나타나는 유형을 확인하였다. 마지막으로 데이터 개수가 상대적으로 적거나 많을 때 대부분 알고리즘들의 성능이 이상치 유형에 더 강하게 영향을 받으며, 상대적으로 고차원일 경우 지역, 전역 이상치에서는 우수한 성능을 보였지만 군집화 이상치 유형에서 낮은 성능을 나타냄을 확인하였다.

텍스트 스트리밍 데이터에서 텍스트 임베딩과 이상 패턴 탐지를 이용한 신규 주제 발생 탐지 (Emerging Topic Detection Using Text Embedding and Anomaly Pattern Detection in Text Streaming Data)

  • 최세목;박정희
    • 한국멀티미디어학회논문지
    • /
    • 제23권9호
    • /
    • pp.1181-1190
    • /
    • 2020
  • Detection of an anomaly pattern deviating normal data distribution in streaming data is an important technique in many application areas. In this paper, a method for detection of an newly emerging pattern in text streaming data which is an ordered sequence of texts is proposed based on text embedding and anomaly pattern detection. Using text embedding methods such as BOW(Bag Of Words), Word2Vec, and BERT, the detection performance of the proposed method is compared. Experimental results show that anomaly pattern detection using BERT embedding gave an average F1 value of 0.85 and the F1 value of 1 in three cases among five test cases.

Dependence of spacecraft anomalies at different orbits on energetic electron and proton fluxes

  • Yi, Kangwoo;Moon, Yong-Jae;Lee, Ensang;Lee, Jae-Ok
    • 천문학회보
    • /
    • 제41권1호
    • /
    • pp.45.2-45.2
    • /
    • 2016
  • In this study we investigate 195 spacecraft anomalies from 1998 to 2010 from Satellite News Digest (SND). We classify these data according to types of anomaly : Control, Power, Telemetry etc. We examine the association between these anomaly data and daily peak particle (electron and proton) flux data from GOES as well as their occurrence rates. To determine the association, we use two criteria that electron criterion is >10,000 pfu and proton criterion is >100 pfu. Main results from this study are as flows. First, the number of days satisfying the criteria for electron flux has a peak near a week before the anomaly day and decreases from the peak day to the anomaly day, while that for proton flux has a peak near the anomaly day. Second, we found a similar pattern for the mean daily peak particle (electron and proton) flux as a function of day before the anomaly day. Third, an examination of multiple spacecraft anomaly events, which are likely to occur by severe space weather effects, shows that anomalies mostly occur either when electron fluxes are in the declining stage, or when daily proton peak fluxes are strongly enhanced. This result is very consistent with the above statistical studies. Our results will be discussed in view of the origins of spacecraft anomaly.

  • PDF

TadGAN 기반 시계열 이상 탐지를 활용한 전처리 프로세스 연구 (A Pre-processing Process Using TadGAN-based Time-series Anomaly Detection)

  • 이승훈;김용수
    • 품질경영학회지
    • /
    • 제50권3호
    • /
    • pp.459-471
    • /
    • 2022
  • Purpose: The purpose of this study was to increase prediction accuracy for an anomaly interval identified using an artificial intelligence-based time series anomaly detection technique by establishing a pre-processing process. Methods: Significant variables were extracted by applying feature selection techniques, and anomalies were derived using the TadGAN time series anomaly detection algorithm. After applying machine learning and deep learning methodologies using normal section data (excluding anomaly sections), the explanatory power of the anomaly sections was demonstrated through performance comparison. Results: The results of the machine learning methodology, the performance was the best when SHAP and TadGAN were applied, and the results in the deep learning, the performance was excellent when Chi-square Test and TadGAN were applied. Comparing each performance with the papers applied with a Conventional methodology using the same data, it can be seen that the performance of the MLR was significantly improved to 15%, Random Forest to 24%, XGBoost to 30%, Lasso Regression to 73%, LSTM to 17% and GRU to 19%. Conclusion: Based on the proposed process, when detecting unsupervised learning anomalies of data that are not actually labeled in various fields such as cyber security, financial sector, behavior pattern field, SNS. It is expected to prove the accuracy and explanation of the anomaly detection section and improve the performance of the model.

이상 전력 탐지를 위한 TCN-USAD (TCN-USAD for Anomaly Power Detection)

  • 진현석;김경백
    • 스마트미디어저널
    • /
    • 제13권7호
    • /
    • pp.9-17
    • /
    • 2024
  • 에너지 사용량의 증가와 친환경 정책으로 인해 건물 에너지를 효율적으로 소비할 필요가 있으며, 이를 위해 딥러닝 기반 이상 전력 탐지가 수행되고 있다. 수집이 어려운 이상치 데이터의 특징으로 인해 Recurrent Neural Network(RNN) 기반 오토인코더를 활용한 복원 에러 기반으로 이상 탐지가 수행되고 있으나, 시계열 특징을 온전히 학습하는데 시간이 오래 걸리고 학습 데이터의 노이즈에 민감하다는 단점이 있다. 본 논문에서는 이러한 한계를 극복하기 위해 Temporal Convolutional Network(TCN)과 UnSupervised Anomaly Detection for multivariate time series(USAD)를 결합한 TCN-USAD를 제안한다. 제안된 모델은 TCN 기반 오토인코더와 두 개의 디코더와 적대적 학습을 사용하는 USAD 구조를 활용하여 빠르게 시계열 특징을 온전히 학습할 수 있고 강건한 이상 탐지가 가능하다. TCN-USAD의 성능을 입증하기 위해 2개의 건물 전력 사용량 데이터 세트를 사용하여 비교 실험을 수행한 결과, TCN 기반 오토인코더는 RNN 기반 오토 인코더 대비 빠르고 복원 성능이 우수하였으며, 이를 활용한 TCN-USAD는 다른 이상 탐지 모델 대비 약 20% 개선된 F1-Score를 달성하여 뛰어난 이상 탐지 성능을 보였다.

LIME을 활용한 준지도 학습 기반 이상 탐지 모델: 반도체 공정을 중심으로 (Anomaly Detection Model Based on Semi-Supervised Learning Using LIME: Focusing on Semiconductor Process)

  • 안강민;신주은;백동현
    • 산업경영시스템학회지
    • /
    • 제45권4호
    • /
    • pp.86-98
    • /
    • 2022
  • Recently, many studies have been conducted to improve quality by applying machine learning models to semiconductor manufacturing process data. However, in the semiconductor manufacturing process, the ratio of good products is much higher than that of defective products, so the problem of data imbalance is serious in terms of machine learning. In addition, since the number of features of data used in machine learning is very large, it is very important to perform machine learning by extracting only important features from among them to increase accuracy and utilization. This study proposes an anomaly detection methodology that can learn excellently despite data imbalance and high-dimensional characteristics of semiconductor process data. The anomaly detection methodology applies the LIME algorithm after applying the SMOTE method and the RFECV method. The proposed methodology analyzes the classification result of the anomaly classification model, detects the cause of the anomaly, and derives a semiconductor process requiring action. The proposed methodology confirmed applicability and feasibility through application of cases.

건축공간 환경관리 지원을 위한 AI·IoT 기반 이상패턴 검출에 관한 연구 (A Study on Detection of Abnormal Patterns Based on AI·IoT to Support Environmental Management of Architectural Spaces)

  • 강태욱
    • 한국BIM학회 논문집
    • /
    • 제13권3호
    • /
    • pp.12-20
    • /
    • 2023
  • Deep learning-based anomaly detection technology is used in various fields such as computer vision, speech recognition, and natural language processing. In particular, this technology is applied in various fields such as monitoring manufacturing equipment abnormalities, detecting financial fraud, detecting network hacking, and detecting anomalies in medical images. However, in the field of construction and architecture, research on deep learning-based data anomaly detection technology is difficult due to the lack of digitization of domain knowledge due to late digital conversion, lack of learning data, and difficulties in collecting and processing field data in real time. This study acquires necessary data through IoT (Internet of Things) from the viewpoint of monitoring for environmental management of architectural spaces, converts them into a database, learns deep learning, and then supports anomaly patterns using AI (Artificial Infelligence) deep learning-based anomaly detection. We propose an implementation process. The results of this study suggest an effective environmental anomaly pattern detection solution architecture for environmental management of architectural spaces, proving its feasibility. The proposed method enables quick response through real-time data processing and analysis collected from IoT. In order to confirm the effectiveness of the proposed method, performance analysis is performed through prototype implementation to derive the results.

Semi-Supervised Learning Based Anomaly Detection for License Plate OCR in Real Time Video

  • Kim, Bada;Heo, Junyoung
    • International journal of advanced smart convergence
    • /
    • 제9권1호
    • /
    • pp.113-120
    • /
    • 2020
  • Recently, the license plate OCR system has been commercialized in a variety of fields and preferred utilizing low-cost embedded systems using only cameras. This system has a high recognition rate of about 98% or more for the environments such as parking lots where non-vehicle is restricted; however, the environments where non-vehicle objects are not restricted, the recognition rate is about 50% to 70%. This low performance is due to the changes in the environment by non-vehicle objects in real-time situations that occur anomaly data which is similar to the license plates. In this paper, we implement the appropriate anomaly detection based on semi-supervised learning for the license plate OCR system in the real-time environment where the appearance of non-vehicle objects is not restricted. In the experiment, we compare systems which anomaly detection is not implemented in the preceding research with the proposed system in this paper. As a result, the systems which anomaly detection is not implemented had a recognition rate of 77%; however, the systems with the semi-supervised learning based on anomaly detection had 88% of recognition rate. Using the techniques of anomaly detection based on the semi-supervised learning was effective in detecting anomaly data and it was helpful to improve the recognition rate of real-time situations.

Detection of multi-type data anomaly for structural health monitoring using pattern recognition neural network

  • Gao, Ke;Chen, Zhi-Dan;Weng, Shun;Zhu, Hong-Ping;Wu, Li-Ying
    • Smart Structures and Systems
    • /
    • 제29권1호
    • /
    • pp.129-140
    • /
    • 2022
  • The effectiveness of system identification, damage detection, condition assessment and other structural analyses relies heavily on the accuracy and reliability of the measured data in structural health monitoring (SHM) systems. However, data anomalies often occur in SHM systems, leading to inaccurate and untrustworthy analysis results. Therefore, anomalies in the raw data should be detected and cleansed before further analysis. Previous studies on data anomaly detection mainly focused on just single type of data anomaly for denoising or removing outliers, meanwhile, the existing methods of detecting multiple data anomalies are usually time consuming. For these reasons, recognising multiple anomaly patterns for real-time alarm and analysis in field monitoring remains a challenge. Aiming to achieve an efficient and accurate detection for multi-type data anomalies for field SHM, this study proposes a pattern-recognition-based data anomaly detection method that mainly consists of three steps: the feature extraction from the long time-series data samples, the training of a pattern recognition neural network (PRNN) using the features and finally the detection of data anomalies. The feature extraction step remarkably reduces the time cost of the network training, making the detection process very fast. The performance of the proposed method is verified on the basis of the SHM data of two practical long-span bridges. Results indicate that the proposed method recognises multiple data anomalies with very high accuracy and low calculation cost, demonstrating its applicability in field monitoring.

Subset 샘플링 검증 기법을 활용한 MSCRED 모델 기반 발전소 진동 데이터의 이상 진단 (Anomaly Detection In Real Power Plant Vibration Data by MSCRED Base Model Improved By Subset Sampling Validation)

  • 홍수웅;권장우
    • 융합정보논문지
    • /
    • 제12권1호
    • /
    • pp.31-38
    • /
    • 2022
  • 본 논문은 전문가 독립적 비지도 신경망 학습 기반 다변량 시계열 데이터 분석 모델인 MSCRED(Multi-Scale Convolutional Recurrent Encoder-Decoder)의 실제 현장에서의 적용과 Auto-encoder 기반인 MSCRED 모델의 한계인, 학습 데이터가 오염되지 않아야 된다는 점을 극복하기 위한 학습 데이터 샘플링 기법인 Subset Sampling Validation을 제시한다. 라벨 분류가 되어있는 발전소 장비의 진동 데이터를 이용하여 1) 학습 데이터에 비정상 데이터가 섞여 있는 상황을 재현하고, 이를 학습한 경우 2) 1과 같은 상황에서 Subset Sampling Validation 기법을 통해 학습 데이터에서 비정상 데이터를 제거한 경우의 Anomaly Score를 비교하여 MSCRED와 Subset Sampling Validation 기법을 유효성을 평가한다. 이를 통해 본 논문은 전문가 독립적이며 오류 데이터에 강한 이상 진단 프레임워크를 제시해, 다양한 다변량 시계열 데이터 분야에서의 간결하고 정확한 해결 방법을 제시한다.