• Title/Summary/Keyword: Outliers detection

Search Result 183, Processing Time 0.026 seconds

Outliers and Level Shift Detection of the Mean-sea Level, Extreme Highest and Lowest Tide Level Data (평균 해수면 및 최극조위 자료의 이상자료 및 기준고도 변화(Level Shift) 진단)

  • Lee, Gi-Seop;Cho, Hong-Yeon
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.32 no.5
    • /
    • pp.322-330
    • /
    • 2020
  • Modeling for outliers in time series was carried out using the MSL and extreme high, low tide levels (EHL, HLL) data set in the Busan and Mokpo stations. The time-series model is seasonal ARIMA model including the components of the AO (additive outliers) and LS (level shift). The optimal model was selected based on the AIC value and the model parameters were estimated using the 'tso' function (in 'tsoutliers' package of R). The main results by the model application, i.e.. outliers and level shift detections, are as follows. (1) The two AO are detected in the Busan monthly EHL data and the AO magnitudes were estimated to 65.5 cm (by typhoon MAEMI) and 29.5 cm (by typhoon SANBA), respectively. (2) The one level shift in 1983 is detected in Mokpo monthly MSL data, and the LS magnitude was estimated to 21.2 cm by the Youngsan River tidal estuary barrier construction. On the other hand, the RMS errors are computed about 1.95 cm (MSL), 5.11 cm (EHL), and 6.50 cm (ELL) in Busan station, and about 2.10 cm (MSL), 11.80 cm (EHL), and 9.14 cm (ELL) in Mokpo station, respectively.

Detection Power when outliers are present at or near the end of time series

  • Lee, Jong-Seon;An, Mi-Hye;Lee, Jae-Jun
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2003.10a
    • /
    • pp.281-283
    • /
    • 2003
  • 시계열 모형을 따르는 자료의 예측(Forecasting)이나 공정조정(Process Adjustment)의 경우, 자료의 마지막 부분에 발생한 이상치(Outlier)에 의해 크게 영향 받을 수 있다. 그러나 지금까지 제안된 이상치 탐지 방법은 주로 자료의 중간 부분에 발생한 이상치를 검출하는데 효율적이라고 알려져 왔다. 본 연구에서는 자료의 마지막 부분에 발생한 이상치에 대한 기존 탐지 방법의 검출력을 모의 실험을 통해 분석하였다 또한, 이를 개선할 수 있는 방안을 제시하고, 모의 실험을 통해 기존의 검출력과 비교하였다.

  • PDF

A Development of Preprocessing Models of Toll Collection System Data for Travel Time Estimation (통행시간 추정을 위한 TCS 데이터의 전처리 모형 개발)

  • Lee, Hyun-Seok;NamKoong, Seong J.
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.8 no.5
    • /
    • pp.1-11
    • /
    • 2009
  • TCS Data imply characteristics of traffic conditions. However, there are outliers in TCS data, which can not represent the travel time of the pertinent section, if these outliers are not eliminated, travel time may be distorted owing to these outliers. Various travel time can be distributed under the same section and time because the variation of the travel time is increase as the section distance is increase, which make difficult to calculate the representative of travel time. Accordingly, it is important to grasp travel time characteristics in order to compute the representative of travel time using TCS Data. In this study, after analyzing the variation ratio of the travel time according to the link distance and the level of congestion, the outlier elimination model and the smoothing model for TCS data were proposed. The results show that the proposed model can be utilized for estimating a reliable travel time for a long-distance path in which there are a variation of travel times from the same departure time, the intervals are large and the change in the representative travel time is irregular for a short period.

  • PDF

A Binary Prediction Method for Outlier Detection using One-class SVM and Spectral Clustering in High Dimensional Data (고차원 데이터에서 One-class SVM과 Spectral Clustering을 이용한 이진 예측 이상치 탐지 방법)

  • Park, Cheong Hee
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.6
    • /
    • pp.886-893
    • /
    • 2022
  • Outlier detection refers to the task of detecting data that deviate significantly from the normal data distribution. Most outlier detection methods compute an outlier score which indicates the degree to which a data sample deviates from normal. However, setting a threshold for an outlier score to determine if a data sample is outlier or normal is not trivial. In this paper, we propose a binary prediction method for outlier detection based on spectral clustering and one-class SVM ensemble. Given training data consisting of normal data samples, a clustering method is performed to find clusters in the training data, and the ensemble of one-class SVM models trained on each cluster finds the boundaries of the normal data. We show how to obtain a threshold for transforming outlier scores computed from the ensemble of one-class SVM models into binary predictive values. Experimental results with high dimensional text data show that the proposed method can be effectively applied to high dimensional data, especially when the normal training data consists of different shapes and densities of clusters.

Outlier detection and time series modelling in the stationary time series (정상 시계열에서의 이상치 발견과 시계열 모형구축)

  • 이종협;최기헌
    • The Korean Journal of Applied Statistics
    • /
    • v.5 no.2
    • /
    • pp.139-156
    • /
    • 1992
  • Recently several authors have introduced iterative methods for detecting time series outliers. Most of these methods are developed under the assumption that an underlying outlier-free model is known or can be identified. Since outliers can distort model identification or even make it impossible, we propose procedure begins with a descriptive data analysis of a time series using distance measures between two observations. Properties of the proposed test statistic are presented. To distinguish the type of an outlier are used transfer function models. An empirical example is given to illustrate the time series modeling procedure.

  • PDF

Derivation of Optimal Design Flood by L-Moments and LB-Moments ( I ) - On the method of L-Moments - (L-모멘트 및 LH-모멘트 기법에 의한 적정 설계홍수량의 유도( I ) - L-모멘트법을 중심으로 -)

  • 이순혁;박명근;맹승진;정연수;김동주;류경식
    • Magazine of the Korean Society of Agricultural Engineers
    • /
    • v.40 no.4
    • /
    • pp.45-57
    • /
    • 1998
  • This study was conducted to derive optimal design floods by Generalized Extreme Value (GEV) distribution for the annual maximum series at ten watersheds along Han, Nagdong, Geum, Yeongsan and Seomjin river systems. Adequacy for the analysis of flood data used in this study was established by the tests of Independence, Homogeneity, detection of Outliers. L-coefficient of variation, L-skewness and L-kurtosis were calculated by L-moment ratio respectively. Parameters were estimated by the Methods of Moments and L-Moments. Design floods obtained by Methods of Moments and L-Moments using different methods for plotting positions in GEV distribution were compared by the Relative Mean Errors(RME) and Relative Absolute Errors(RAE). The results were analyzed and summarized as follows. 1. Adequacy for the analysis of flood data was acknowledged by the tests of Independence, Homogeneity and detection of Outliers. 2. GEV distribution used in this study was found to be more suitable one than Pearson type 3 distribution by the goodness of fit test using Kolmogorov-Smirnov test and L-Moment ratios diagram in the applied watersheds. 3. Parameters for GEV distribution were estimated using Methods of Moments and L-Moments. 4. Design floods were calculated by Methods of Moments and L-Moments in GEV distribution. 5. It was found that design floods derived by the method of L-Moments using Weibull plotting position formula in GEV distribution are much closer to those of the observed data in comparison with those obtained by method of moments using different formulas for plotting positions from the viewpoint of Relative Mean Errors and Relative Absolute Errors.

  • PDF

Performance Comparison of Machine Learning Algorithms for Network Traffic Security in Medical Equipment (의료기기 네트워크 트래픽 보안 관련 머신러닝 알고리즘 성능 비교)

  • Seung Hyoung Ko;Joon Ho Park;Da Woon Wang;Eun Seok Kang;Hyun Wook Han
    • Journal of Information Technology Services
    • /
    • v.22 no.5
    • /
    • pp.99-108
    • /
    • 2023
  • As the computerization of hospitals becomes more advanced, security issues regarding data generated from various medical devices within hospitals are gradually increasing. For example, because hospital data contains a variety of personal information, attempts to attack it have been continuously made. In order to safely protect data from external attacks, each hospital has formed an internal team to continuously monitor whether the computer network is safely protected. However, there are limits to how humans can monitor attacks that occur on networks within hospitals in real time. Recently, artificial intelligence models have shown excellent performance in detecting outliers. In this paper, an experiment was conducted to verify how well an artificial intelligence model classifies normal and abnormal data in network traffic data generated from medical devices. There are several models used for outlier detection, but among them, Random Forest and Tabnet were used. Tabnet is a deep learning algorithm related to receive and classify structured data. Two algorithms were trained using open traffic network data, and the classification accuracy of the model was measured using test data. As a result, the random forest algorithm showed a classification accuracy of 93%, and Tapnet showed a classification accuracy of 99%. Therefore, it is expected that most outliers that may occur in a hospital network can be detected using an excellent algorithm such as Tabnet.

A Study of Anomaly Detection for ICT Infrastructure using Conditional Multimodal Autoencoder (ICT 인프라 이상탐지를 위한 조건부 멀티모달 오토인코더에 관한 연구)

  • Shin, Byungjin;Lee, Jonghoon;Han, Sangjin;Park, Choong-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.57-73
    • /
    • 2021
  • Maintenance and prevention of failure through anomaly detection of ICT infrastructure is becoming important. System monitoring data is multidimensional time series data. When we deal with multidimensional time series data, we have difficulty in considering both characteristics of multidimensional data and characteristics of time series data. When dealing with multidimensional data, correlation between variables should be considered. Existing methods such as probability and linear base, distance base, etc. are degraded due to limitations called the curse of dimensions. In addition, time series data is preprocessed by applying sliding window technique and time series decomposition for self-correlation analysis. These techniques are the cause of increasing the dimension of data, so it is necessary to supplement them. The anomaly detection field is an old research field, and statistical methods and regression analysis were used in the early days. Currently, there are active studies to apply machine learning and artificial neural network technology to this field. Statistically based methods are difficult to apply when data is non-homogeneous, and do not detect local outliers well. The regression analysis method compares the predictive value and the actual value after learning the regression formula based on the parametric statistics and it detects abnormality. Anomaly detection using regression analysis has the disadvantage that the performance is lowered when the model is not solid and the noise or outliers of the data are included. There is a restriction that learning data with noise or outliers should be used. The autoencoder using artificial neural networks is learned to output as similar as possible to input data. It has many advantages compared to existing probability and linear model, cluster analysis, and map learning. It can be applied to data that does not satisfy probability distribution or linear assumption. In addition, it is possible to learn non-mapping without label data for teaching. However, there is a limitation of local outlier identification of multidimensional data in anomaly detection, and there is a problem that the dimension of data is greatly increased due to the characteristics of time series data. In this study, we propose a CMAE (Conditional Multimodal Autoencoder) that enhances the performance of anomaly detection by considering local outliers and time series characteristics. First, we applied Multimodal Autoencoder (MAE) to improve the limitations of local outlier identification of multidimensional data. Multimodals are commonly used to learn different types of inputs, such as voice and image. The different modal shares the bottleneck effect of Autoencoder and it learns correlation. In addition, CAE (Conditional Autoencoder) was used to learn the characteristics of time series data effectively without increasing the dimension of data. In general, conditional input mainly uses category variables, but in this study, time was used as a condition to learn periodicity. The CMAE model proposed in this paper was verified by comparing with the Unimodal Autoencoder (UAE) and Multi-modal Autoencoder (MAE). The restoration performance of Autoencoder for 41 variables was confirmed in the proposed model and the comparison model. The restoration performance is different by variables, and the restoration is normally well operated because the loss value is small for Memory, Disk, and Network modals in all three Autoencoder models. The process modal did not show a significant difference in all three models, and the CPU modal showed excellent performance in CMAE. ROC curve was prepared for the evaluation of anomaly detection performance in the proposed model and the comparison model, and AUC, accuracy, precision, recall, and F1-score were compared. In all indicators, the performance was shown in the order of CMAE, MAE, and AE. Especially, the reproduction rate was 0.9828 for CMAE, which can be confirmed to detect almost most of the abnormalities. The accuracy of the model was also improved and 87.12%, and the F1-score was 0.8883, which is considered to be suitable for anomaly detection. In practical aspect, the proposed model has an additional advantage in addition to performance improvement. The use of techniques such as time series decomposition and sliding windows has the disadvantage of managing unnecessary procedures; and their dimensional increase can cause a decrease in the computational speed in inference.The proposed model has characteristics that are easy to apply to practical tasks such as inference speed and model management.

A Study on Atmospheric Data Anomaly Detection Algorithm based on Unsupervised Learning Using Adversarial Generative Neural Network (적대적 생성 신경망을 활용한 비지도 학습 기반의 대기 자료 이상 탐지 알고리즘 연구)

  • Yang, Ho-Jun;Lee, Seon-Woo;Lee, Mun-Hyung;Kim, Jong-Gu;Choi, Jung-Mu;Shin, Yu-mi;Lee, Seok-Chae;Kwon, Jang-Woo;Park, Ji-Hoon;Jung, Dong-Hee;Shin, Hye-Jung
    • Journal of Convergence for Information Technology
    • /
    • v.12 no.4
    • /
    • pp.260-269
    • /
    • 2022
  • In this paper, We propose an anomaly detection model using deep neural network to automate the identification of outliers of the national air pollution measurement network data that is previously performed by experts. We generated training data by analyzing missing values and outliers of weather data provided by the Institute of Environmental Research and based on the BeatGAN model of the unsupervised learning method, we propose a new model by changing the kernel structure, adding the convolutional filter layer and the transposed convolutional filter layer to improve anomaly detection performance. In addition, by utilizing the generative features of the proposed model to implement and apply a retraining algorithm that generates new data and uses it for training, it was confirmed that the proposed model had the highest performance compared to the original BeatGAN models and other unsupervised learning model like Iforest and One Class SVM. Through this study, it was possible to suggest a method to improve the anomaly detection performance of proposed model while avoiding overfitting without additional cost in situations where training data are insufficient due to various factors such as sensor abnormalities and inspections in actual industrial sites.

Design and Analysis of Lightweight Trust Mechanism for Accessing Data in MANETs

  • Kumar, Adarsh;Gopal, Krishna;Aggarwal, Alok
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.3
    • /
    • pp.1119-1143
    • /
    • 2014
  • Lightweight trust mechanism with lightweight cryptographic primitives has emerged as an important mechanism in resource constraint wireless sensor based mobile devices. In this work, outlier detection in lightweight Mobile Ad-hoc NETworks (MANETs) is extended to create the space of reliable trust cycle with anomaly detection mechanism and minimum energy losses [1]. Further, system is tested against outliers through detection ratios and anomaly scores before incorporating virtual programmable nodes to increase the efficiency. Security in proposed system is verified through ProVerif automated toolkit and mathematical analysis shows that it is strong against bad mouthing and on-off attacks. Performance of proposed technique is analyzed over different MANET routing protocols with variations in number of nodes and it is observed that system provide good amount of throughput with maximum of 20% increase in delay on increase of maximum of 100 nodes. System is reflecting good amount of scalability, optimization of resources and security. Lightweight modeling and policy analysis with lightweight cryptographic primitives shows that the intruders can be detection in few milliseconds without any conflicts in access rights.