• Title/Summary/Keyword: Anomaly Data

Search Result 783, Processing Time 0.033 seconds

Performance Comparison of Anomaly Detection Algorithms: in terms of Anomaly Type and Data Properties (이상탐지 알고리즘 성능 비교: 이상치 유형과 데이터 속성 관점에서)

  • Jaeung Kim;Seung Ryul Jeong;Namgyu Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.229-247
    • /
    • 2023
  • With the increasing emphasis on anomaly detection across various fields, diverse anomaly detection algorithms have been developed for various data types and anomaly patterns. However, the performance of anomaly detection algorithms is generally evaluated on publicly available datasets, and the specific performance of each algorithm on anomalies of particular types remains unexplored. Consequently, selecting an appropriate anomaly detection algorithm for specific analytical contexts poses challenges. Therefore, in this paper, we aim to investigate the types of anomalies and various attributes of data. Subsequently, we intend to propose approaches that can assist in the selection of appropriate anomaly detection algorithms based on this understanding. Specifically, this study compares the performance of anomaly detection algorithms for four types of anomalies: local, global, contextual, and clustered anomalies. Through further analysis, the impact of label availability, data quantity, and dimensionality on algorithm performance is examined. Experimental results demonstrate that the most effective algorithm varies depending on the type of anomaly, and certain algorithms exhibit stable performance even in the absence of anomaly-specific information. Furthermore, in some types of anomalies, the performance of unsupervised anomaly detection algorithms was observed to be lower than that of supervised and semi-supervised learning algorithms. Lastly, we found that the performance of most algorithms is more strongly influenced by the type of anomalies when the data quantity is relatively scarce or abundant. Additionally, in cases of higher dimensionality, it was noted that excellent performance was exhibited in detecting local and global anomalies, while lower performance was observed for clustered anomaly types.

Emerging Topic Detection Using Text Embedding and Anomaly Pattern Detection in Text Streaming Data (텍스트 스트리밍 데이터에서 텍스트 임베딩과 이상 패턴 탐지를 이용한 신규 주제 발생 탐지)

  • Choi, Semok;Park, Cheong Hee
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.9
    • /
    • pp.1181-1190
    • /
    • 2020
  • Detection of an anomaly pattern deviating normal data distribution in streaming data is an important technique in many application areas. In this paper, a method for detection of an newly emerging pattern in text streaming data which is an ordered sequence of texts is proposed based on text embedding and anomaly pattern detection. Using text embedding methods such as BOW(Bag Of Words), Word2Vec, and BERT, the detection performance of the proposed method is compared. Experimental results show that anomaly pattern detection using BERT embedding gave an average F1 value of 0.85 and the F1 value of 1 in three cases among five test cases.

Dependence of spacecraft anomalies at different orbits on energetic electron and proton fluxes

  • Yi, Kangwoo;Moon, Yong-Jae;Lee, Ensang;Lee, Jae-Ok
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.41 no.1
    • /
    • pp.45.2-45.2
    • /
    • 2016
  • In this study we investigate 195 spacecraft anomalies from 1998 to 2010 from Satellite News Digest (SND). We classify these data according to types of anomaly : Control, Power, Telemetry etc. We examine the association between these anomaly data and daily peak particle (electron and proton) flux data from GOES as well as their occurrence rates. To determine the association, we use two criteria that electron criterion is >10,000 pfu and proton criterion is >100 pfu. Main results from this study are as flows. First, the number of days satisfying the criteria for electron flux has a peak near a week before the anomaly day and decreases from the peak day to the anomaly day, while that for proton flux has a peak near the anomaly day. Second, we found a similar pattern for the mean daily peak particle (electron and proton) flux as a function of day before the anomaly day. Third, an examination of multiple spacecraft anomaly events, which are likely to occur by severe space weather effects, shows that anomalies mostly occur either when electron fluxes are in the declining stage, or when daily proton peak fluxes are strongly enhanced. This result is very consistent with the above statistical studies. Our results will be discussed in view of the origins of spacecraft anomaly.

  • PDF

A Pre-processing Process Using TadGAN-based Time-series Anomaly Detection (TadGAN 기반 시계열 이상 탐지를 활용한 전처리 프로세스 연구)

  • Lee, Seung Hoon;Kim, Yong Soo
    • Journal of Korean Society for Quality Management
    • /
    • v.50 no.3
    • /
    • pp.459-471
    • /
    • 2022
  • Purpose: The purpose of this study was to increase prediction accuracy for an anomaly interval identified using an artificial intelligence-based time series anomaly detection technique by establishing a pre-processing process. Methods: Significant variables were extracted by applying feature selection techniques, and anomalies were derived using the TadGAN time series anomaly detection algorithm. After applying machine learning and deep learning methodologies using normal section data (excluding anomaly sections), the explanatory power of the anomaly sections was demonstrated through performance comparison. Results: The results of the machine learning methodology, the performance was the best when SHAP and TadGAN were applied, and the results in the deep learning, the performance was excellent when Chi-square Test and TadGAN were applied. Comparing each performance with the papers applied with a Conventional methodology using the same data, it can be seen that the performance of the MLR was significantly improved to 15%, Random Forest to 24%, XGBoost to 30%, Lasso Regression to 73%, LSTM to 17% and GRU to 19%. Conclusion: Based on the proposed process, when detecting unsupervised learning anomalies of data that are not actually labeled in various fields such as cyber security, financial sector, behavior pattern field, SNS. It is expected to prove the accuracy and explanation of the anomaly detection section and improve the performance of the model.

Anomaly Detection Model Based on Semi-Supervised Learning Using LIME: Focusing on Semiconductor Process (LIME을 활용한 준지도 학습 기반 이상 탐지 모델: 반도체 공정을 중심으로)

  • Kang-Min An;Ju-Eun Shin;Dong Hyun Baek
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.45 no.4
    • /
    • pp.86-98
    • /
    • 2022
  • Recently, many studies have been conducted to improve quality by applying machine learning models to semiconductor manufacturing process data. However, in the semiconductor manufacturing process, the ratio of good products is much higher than that of defective products, so the problem of data imbalance is serious in terms of machine learning. In addition, since the number of features of data used in machine learning is very large, it is very important to perform machine learning by extracting only important features from among them to increase accuracy and utilization. This study proposes an anomaly detection methodology that can learn excellently despite data imbalance and high-dimensional characteristics of semiconductor process data. The anomaly detection methodology applies the LIME algorithm after applying the SMOTE method and the RFECV method. The proposed methodology analyzes the classification result of the anomaly classification model, detects the cause of the anomaly, and derives a semiconductor process requiring action. The proposed methodology confirmed applicability and feasibility through application of cases.

A Study on Detection of Abnormal Patterns Based on AI·IoT to Support Environmental Management of Architectural Spaces (건축공간 환경관리 지원을 위한 AI·IoT 기반 이상패턴 검출에 관한 연구)

  • Kang, Tae-Wook
    • Journal of KIBIM
    • /
    • v.13 no.3
    • /
    • pp.12-20
    • /
    • 2023
  • Deep learning-based anomaly detection technology is used in various fields such as computer vision, speech recognition, and natural language processing. In particular, this technology is applied in various fields such as monitoring manufacturing equipment abnormalities, detecting financial fraud, detecting network hacking, and detecting anomalies in medical images. However, in the field of construction and architecture, research on deep learning-based data anomaly detection technology is difficult due to the lack of digitization of domain knowledge due to late digital conversion, lack of learning data, and difficulties in collecting and processing field data in real time. This study acquires necessary data through IoT (Internet of Things) from the viewpoint of monitoring for environmental management of architectural spaces, converts them into a database, learns deep learning, and then supports anomaly patterns using AI (Artificial Infelligence) deep learning-based anomaly detection. We propose an implementation process. The results of this study suggest an effective environmental anomaly pattern detection solution architecture for environmental management of architectural spaces, proving its feasibility. The proposed method enables quick response through real-time data processing and analysis collected from IoT. In order to confirm the effectiveness of the proposed method, performance analysis is performed through prototype implementation to derive the results.

Semi-Supervised Learning Based Anomaly Detection for License Plate OCR in Real Time Video

  • Kim, Bada;Heo, Junyoung
    • International journal of advanced smart convergence
    • /
    • v.9 no.1
    • /
    • pp.113-120
    • /
    • 2020
  • Recently, the license plate OCR system has been commercialized in a variety of fields and preferred utilizing low-cost embedded systems using only cameras. This system has a high recognition rate of about 98% or more for the environments such as parking lots where non-vehicle is restricted; however, the environments where non-vehicle objects are not restricted, the recognition rate is about 50% to 70%. This low performance is due to the changes in the environment by non-vehicle objects in real-time situations that occur anomaly data which is similar to the license plates. In this paper, we implement the appropriate anomaly detection based on semi-supervised learning for the license plate OCR system in the real-time environment where the appearance of non-vehicle objects is not restricted. In the experiment, we compare systems which anomaly detection is not implemented in the preceding research with the proposed system in this paper. As a result, the systems which anomaly detection is not implemented had a recognition rate of 77%; however, the systems with the semi-supervised learning based on anomaly detection had 88% of recognition rate. Using the techniques of anomaly detection based on the semi-supervised learning was effective in detecting anomaly data and it was helpful to improve the recognition rate of real-time situations.

Detection of multi-type data anomaly for structural health monitoring using pattern recognition neural network

  • Gao, Ke;Chen, Zhi-Dan;Weng, Shun;Zhu, Hong-Ping;Wu, Li-Ying
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.129-140
    • /
    • 2022
  • The effectiveness of system identification, damage detection, condition assessment and other structural analyses relies heavily on the accuracy and reliability of the measured data in structural health monitoring (SHM) systems. However, data anomalies often occur in SHM systems, leading to inaccurate and untrustworthy analysis results. Therefore, anomalies in the raw data should be detected and cleansed before further analysis. Previous studies on data anomaly detection mainly focused on just single type of data anomaly for denoising or removing outliers, meanwhile, the existing methods of detecting multiple data anomalies are usually time consuming. For these reasons, recognising multiple anomaly patterns for real-time alarm and analysis in field monitoring remains a challenge. Aiming to achieve an efficient and accurate detection for multi-type data anomalies for field SHM, this study proposes a pattern-recognition-based data anomaly detection method that mainly consists of three steps: the feature extraction from the long time-series data samples, the training of a pattern recognition neural network (PRNN) using the features and finally the detection of data anomalies. The feature extraction step remarkably reduces the time cost of the network training, making the detection process very fast. The performance of the proposed method is verified on the basis of the SHM data of two practical long-span bridges. Results indicate that the proposed method recognises multiple data anomalies with very high accuracy and low calculation cost, demonstrating its applicability in field monitoring.

Anomaly Detection In Real Power Plant Vibration Data by MSCRED Base Model Improved By Subset Sampling Validation (Subset 샘플링 검증 기법을 활용한 MSCRED 모델 기반 발전소 진동 데이터의 이상 진단)

  • Hong, Su-Woong;Kwon, Jang-Woo
    • Journal of Convergence for Information Technology
    • /
    • v.12 no.1
    • /
    • pp.31-38
    • /
    • 2022
  • This paper applies an expert independent unsupervised neural network learning-based multivariate time series data analysis model, MSCRED(Multi-Scale Convolutional Recurrent Encoder-Decoder), and to overcome the limitation, because the MCRED is based on Auto-encoder model, that train data must not to be contaminated, by using learning data sampling technique, called Subset Sampling Validation. By using the vibration data of power plant equipment that has been labeled, the classification performance of MSCRED is evaluated with the Anomaly Score in many cases, 1) the abnormal data is mixed with the training data 2) when the abnormal data is removed from the training data in case 1. Through this, this paper presents an expert-independent anomaly diagnosis framework that is strong against error data, and presents a concise and accurate solution in various fields of multivariate time series data.

Tropospheric Anomaly Detection in Multi-reference Stations Environment during Localized Atmosphere Conditions-(1) : Basic Concept of Anomaly Detection Algorithm

  • Yoo, Yun-Ja
    • Journal of Navigation and Port Research
    • /
    • v.40 no.5
    • /
    • pp.265-270
    • /
    • 2016
  • Extreme tropospheric anomalies such as typhoons or regional torrential rain can degrade positioning accuracy of the GPS signal. It becomes one of the main error terms affecting high-precision positioning solutions in network RTK. This paper proposed a detection algorithm to be used during atmospheric anomalies in order to detect the tropospheric irregularities that can degrade the quality of correction data due to network errors caused by inhomogeneous atmospheric conditions between multi-reference stations. It uses an atmospheric grid that consists of four meteorological stations and estimates the troposphere zenith total delay difference at a low performance point in an atmospheric grid. AWS (automatic weather station) meteorological data can be applied to the proposed tropospheric anomaly detection algorithm when there are different atmospheric conditions between the stations. The concept of probability density distribution of the delta troposphere slant delay was proposed for the threshold determination.