• Title/Summary/Keyword: Conditional Outlier Detection

Search Result 7, Processing Time 0.025 seconds

Identification of Incorrect Data Labels Using Conditional Outlier Detection

  • Hong, Charmgil
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.8
    • /
    • pp.915-926
    • /
    • 2020
  • Outlier detection methods help one to identify unusual instances in data that may correspond to erroneous, exceptional, or surprising events or behaviors. This work studies conditional outlier detection, a special instance of the outlier detection problem, in the context of incorrect data label identification. Unlike conventional (unconditional) outlier detection methods that seek abnormalities across all data attributes, conditional outlier detection assumes data are given in pairs of input (condition) and output (response or label). Accordingly, the goal of conditional outlier detection is to identify incorrect or unusual output assignments considering their input as condition. As a solution to conditional outlier detection, this paper proposes the ratio-based outlier scoring (ROS) approach and its variant. The propose solutions work by adopting conventional outlier scores and are able to apply them to identify conditional outliers in data. Experiments on synthetic and real-world image datasets are conducted to demonstrate the benefits and advantages of the proposed approaches.

OUTLIER DETECTION BASED ON A CHANGE OF LIKELIHOOD

  • Kim, Myung-Geun
    • Journal of applied mathematics & informatics
    • /
    • v.26 no.5_6
    • /
    • pp.1133-1138
    • /
    • 2008
  • A general method of detecting outliers based on a change of likelihood by using the influence function is suggested. It can be applied to all kinds of distributions that are specified by parameters. For the multivariate normal case, specific computations are made to get the corresponding conditional influence function. A numerical example is provided for illustration.

  • PDF

Robust Estimation and Outlier Detection

  • Myung Geun Kim
    • Communications for Statistical Applications and Methods
    • /
    • v.1 no.1
    • /
    • pp.33-40
    • /
    • 1994
  • The conditional expectation of a random variable in a multivariate normal random vector is a multiple linear regression on its predecessors. Using this fact, the least median of squares estimation method developed in a multiple linear regression is adapted to a multivariate data to identify influential observations. The resulting method clearly detect outliers and it avoids the masking effect.

  • PDF

Online condition assessment of high-speed trains based on Bayesian forecasting approach and time series analysis

  • Zhang, Lin-Hao;Wang, You-Wu;Ni, Yi-Qing;Lai, Siu-Kai
    • Smart Structures and Systems
    • /
    • v.21 no.5
    • /
    • pp.705-713
    • /
    • 2018
  • High-speed rail (HSR) has been in operation and development in many countries worldwide. The explosive growth of HSR has posed great challenges for operation safety and ride comfort. Among various technological demands on high-speed trains, vibration is an inevitable problem caused by rail/wheel imperfections, vehicle dynamics, and aerodynamic instability. Ride comfort is a key factor in evaluating the operational performance of high-speed trains. In this study, online monitoring data have been acquired from an in-service high-speed train for condition assessment. The measured dynamic response signals at the floor level of a train cabin are processed by the Sperling operator, in which the ride comfort index sequence is used to identify the train's operation condition. In addition, a novel technique that incorporates salient features of Bayesian inference and time series analysis is proposed for outlier detection and change detection. The Bayesian forecasting approach enables the prediction of conditional probabilities. By integrating the Bayesian forecasting approach with time series analysis, one-step forecasting probability density functions (PDFs) can be obtained before proceeding to the next observation. The change detection is conducted by comparing the current model and the alternative model (whose mean value is shifted by a prescribed offset) to determine which one can well fit the actual observation. When the comparison results indicate that the alternative model performs better, then a potential change is detected. If the current observation is a potential outlier or change, Bayes factor and cumulative Bayes factor are derived for further identification. A significant change, if identified, implies that there is a great alteration in the train operation performance due to defects. In this study, two illustrative cases are provided to demonstrate the performance of the proposed method for condition assessment of high-speed trains.

A Study of Anomaly Detection for ICT Infrastructure using Conditional Multimodal Autoencoder (ICT 인프라 이상탐지를 위한 조건부 멀티모달 오토인코더에 관한 연구)

  • Shin, Byungjin;Lee, Jonghoon;Han, Sangjin;Park, Choong-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.57-73
    • /
    • 2021
  • Maintenance and prevention of failure through anomaly detection of ICT infrastructure is becoming important. System monitoring data is multidimensional time series data. When we deal with multidimensional time series data, we have difficulty in considering both characteristics of multidimensional data and characteristics of time series data. When dealing with multidimensional data, correlation between variables should be considered. Existing methods such as probability and linear base, distance base, etc. are degraded due to limitations called the curse of dimensions. In addition, time series data is preprocessed by applying sliding window technique and time series decomposition for self-correlation analysis. These techniques are the cause of increasing the dimension of data, so it is necessary to supplement them. The anomaly detection field is an old research field, and statistical methods and regression analysis were used in the early days. Currently, there are active studies to apply machine learning and artificial neural network technology to this field. Statistically based methods are difficult to apply when data is non-homogeneous, and do not detect local outliers well. The regression analysis method compares the predictive value and the actual value after learning the regression formula based on the parametric statistics and it detects abnormality. Anomaly detection using regression analysis has the disadvantage that the performance is lowered when the model is not solid and the noise or outliers of the data are included. There is a restriction that learning data with noise or outliers should be used. The autoencoder using artificial neural networks is learned to output as similar as possible to input data. It has many advantages compared to existing probability and linear model, cluster analysis, and map learning. It can be applied to data that does not satisfy probability distribution or linear assumption. In addition, it is possible to learn non-mapping without label data for teaching. However, there is a limitation of local outlier identification of multidimensional data in anomaly detection, and there is a problem that the dimension of data is greatly increased due to the characteristics of time series data. In this study, we propose a CMAE (Conditional Multimodal Autoencoder) that enhances the performance of anomaly detection by considering local outliers and time series characteristics. First, we applied Multimodal Autoencoder (MAE) to improve the limitations of local outlier identification of multidimensional data. Multimodals are commonly used to learn different types of inputs, such as voice and image. The different modal shares the bottleneck effect of Autoencoder and it learns correlation. In addition, CAE (Conditional Autoencoder) was used to learn the characteristics of time series data effectively without increasing the dimension of data. In general, conditional input mainly uses category variables, but in this study, time was used as a condition to learn periodicity. The CMAE model proposed in this paper was verified by comparing with the Unimodal Autoencoder (UAE) and Multi-modal Autoencoder (MAE). The restoration performance of Autoencoder for 41 variables was confirmed in the proposed model and the comparison model. The restoration performance is different by variables, and the restoration is normally well operated because the loss value is small for Memory, Disk, and Network modals in all three Autoencoder models. The process modal did not show a significant difference in all three models, and the CPU modal showed excellent performance in CMAE. ROC curve was prepared for the evaluation of anomaly detection performance in the proposed model and the comparison model, and AUC, accuracy, precision, recall, and F1-score were compared. In all indicators, the performance was shown in the order of CMAE, MAE, and AE. Especially, the reproduction rate was 0.9828 for CMAE, which can be confirmed to detect almost most of the abnormalities. The accuracy of the model was also improved and 87.12%, and the F1-score was 0.8883, which is considered to be suitable for anomaly detection. In practical aspect, the proposed model has an additional advantage in addition to performance improvement. The use of techniques such as time series decomposition and sliding windows has the disadvantage of managing unnecessary procedures; and their dimensional increase can cause a decrease in the computational speed in inference.The proposed model has characteristics that are easy to apply to practical tasks such as inference speed and model management.

The Assessing Comparative Study for Statistical Process Control of Software Reliability Model Based on polynomial hazard function (다항 위험함수에 근거한 NHPP 소프트웨어 신뢰모형에 관한 통계적 공정관리 접근방법 비교연구)

  • Kim, Hee-Cheul;Shin, Hyun-Cheul
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.8 no.5
    • /
    • pp.345-353
    • /
    • 2015
  • There are many software reliability models that are based on the times of occurrences of errors in the debugging of software. It is shown that it is possible to do parameter inference for software reliability models based on finite failure model and non-homogeneous Poisson Processes (NHPP). For someone making a decision to market software, the conditional failure rate is an important variables. In this case, finite failure model are used in a wide variety of practical situations. Their use in characterization problems, detection of outlier, linear estimation, study of system reliability, life-testing, survival analysis, data compression and many other fields can be seen from the many study. Statistical process control (SPC) can monitor the forecasting of software failure and thereby contribute significantly to the improvement of software reliability. Control charts are widely used for software process control in the software industry. In this paper, proposed a control mechanism based on NHPP using mean value function of polynomial hazard function.

The Assessing Comparative Study for Statistical Process Control of Software Reliability Model Based on Musa-Okumo and Power-law Type (Musa-Okumoto와 Power-law형 NHPP 소프트웨어 신뢰모형에 관한 통계적 공정관리 접근방법 비교연구)

  • Kim, Hee-Cheul
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.8 no.6
    • /
    • pp.483-490
    • /
    • 2015
  • There are many software reliability models that are based on the times of occurrences of errors in the debugging of software. It is shown that it is possible to do likelihood inference for software reliability models based on finite failure model and non-homogeneous Poisson Processes (NHPP). For someone making a decision about when to market software, the conditional failure rate is an important variables. The infinite failure model are used in a wide variety of practical situations. Their use in characterization problems, detection of outlier, linear estimation, study of system reliability, life-testing, survival analysis, data compression and many other fields can be seen from the many study. Statistical process control (SPC) can monitor the forecasting of software failure and thereby contribute significantly to the improvement of software reliability. Control charts are widely used for software process control in the software industry. In this paper, proposed a control mechanism based on NHPP using mean value function of Musa-Okumo and Power law type property.