• Title/Summary/Keyword: outlier identification

Search Result 36, Processing Time 0.021 seconds

A procedure for simultaneous variable selection, variable transformation and outlier identification in linear regression (선형회귀에서 변수선택, 변수변환과 이상치 탐지의 동시적 수행을 위한 절차)

  • Seo, Han Son;Yoon, Min
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.1
    • /
    • pp.1-10
    • /
    • 2020
  • We propose a unified approach to variable selection, transformation and outliers in the linear model. The procedure includes a sequential method for outlier detection and a least trimmed squares estimator for variable transformation. It uses all possible subsets regressions for model selection. Some real data analyses and the simulation results are provided to show the efficiency of the methods in the context of the correct variable selection and the fitness of the estimated model.

Classifier Combination Based Source Identification for Cell Phone Images

  • Wang, Bo;Tan, Yue;Zhao, Meijuan;Guo, Yanqing;Kong, Xiangwei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.12
    • /
    • pp.5087-5102
    • /
    • 2015
  • Rapid popularization of smart cell phone equipped with camera has led to a number of new legal and criminal problems related to multimedia such as digital image, which makes cell phone source identification an important branch of digital image forensics. This paper proposes a classifier combination based source identification strategy for cell phone images. To identify the outlier cell phone models of the training sets in multi-class classifier, a one-class classifier is orderly used in the framework. Feature vectors including color filter array (CFA) interpolation coefficients estimation and multi-feature fusion is employed to verify the effectiveness of the classifier combination strategy. Experimental results demonstrate that for different feature sets, our method presents high accuracy of source identification both for the cell phone in the training sets and the outliers.

RPCA-GMM for Speaker Identification (화자식별을 위한 강인한 주성분 분석 가우시안 혼합 모델)

  • 이윤정;서창우;강상기;이기용
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.7
    • /
    • pp.519-527
    • /
    • 2003
  • Speech is much influenced by the existence of outliers which are introduced by such an unexpected happenings as additive background noise, change of speaker's utterance pattern and voice detection errors. These kinds of outliers may result in severe degradation of speaker recognition performance. In this paper, we proposed the GMM based on robust principal component analysis (RPCA-GMM) using M-estimation to solve the problems of both ouliers and high dimensionality of training feature vectors in speaker identification. Firstly, a new feature vector with reduced dimension is obtained by robust PCA obtained from M-estimation. The robust PCA transforms the original dimensional feature vector onto the reduced dimensional linear subspace that is spanned by the leading eigenvectors of the covariance matrix of feature vector. Secondly, the GMM with diagonal covariance matrix is obtained from these transformed feature vectors. We peformed speaker identification experiments to show the effectiveness of the proposed method. We compared the proposed method (RPCA-GMM) with transformed feature vectors to the PCA and the conventional GMM with diagonal matrix. Whenever the portion of outliers increases by every 2%, the proposed method maintains almost same speaker identification rate with 0.03% of little degradation, while the conventional GMM and the PCA shows much degradation of that by 0.65% and 0.55%, respectively This means that our method is more robust to the existence of outlier.

Chemometric Tool of Chromatographic Pattern Recognition for the Analysis of Complex Mixtures

  • Park, Man-Ki;Park, Jeong-Hill;Cho, Jung-Hwan;Kim, Na-Young;Kang, Jong-Seong
    • Archives of Pharmacal Research
    • /
    • v.15 no.4
    • /
    • pp.376-378
    • /
    • 1992
  • A chemical tool was developed for the analysis of complex mixtures such as crude drugs by the method of pattern recognition. Pattern recognition was accomplished by a multiple reference peak identification method and three kinds of outlier statistics. This tool was tested on the analysis of synthetic mixtures.

  • PDF

Outlier Identification in Regression Analysis using Projection Pursuit

  • Kim, Hyojung;Park, Chongsun
    • Communications for Statistical Applications and Methods
    • /
    • v.7 no.3
    • /
    • pp.633-641
    • /
    • 2000
  • In this paper, we propose a method to identify multiple outliers in regression analysis with only assumption of smoothness on the regression function. Our method uses single-linkage clustering algorithm and Projection Pursuit Regression (PPR). It was compared with existing methods using several simulated and real examples and turned out to be very useful in regression problem with the regression function which is far from linear.

  • PDF

Efficient Speaker Identification based on Robust VQ-PCA (강인한 VQ-PCA에 기반한 효율적인 화자 식별)

  • Lee Ki-Yong
    • Journal of Internet Computing and Services
    • /
    • v.5 no.3
    • /
    • pp.57-62
    • /
    • 2004
  • In this paper, an efficient speaker identification based on robust vector quantizationprincipal component analysis (VQ-PCA) is proposed to solve the problems from outliers and high dimensionality of training feature vectors in speaker identification, Firstly, the proposed method partitions the data space into several disjoint regions by roust VQ based on M-estimation. Secondly, the robust PCA is obtained from the covariance matrix in each region. Finally, our method obtains the Gaussian Mixture model (GMM) for speaker from the transformed feature vectors with reduced dimension by the robust PCA in each region, Compared to the conventional GMM with diagonal covariance matrix, under the same performance, the proposed method gives faster results with less storage and, moreover, shows robust performance to outliers.

  • PDF

An Integrated Fault Detection and Isolation Method for Sensors and Actuators of LEO Satellite (저궤도 인공위성의 센서 및 구동기 통합 고장검출 및 분리 기법)

  • Lim, Jun-Kyu;Lee, Jun-Han;Park, Chan-Gook
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.17 no.11
    • /
    • pp.1117-1124
    • /
    • 2011
  • An integrated fault detection and isolation method is proposed in this paper. The main objective of this paper is development fault detection, isolation and diagnosis algorithm based on the DKF (Decentralized Kalman Filter) and the bank of IMM (Interacting Multiple Model) filters using penalty scalar for both partial and total faults and the outlier detection algorithm for preventing false alarm also included. The proposed FDI (Fault Detection and Isolation) scheme is developed in four phases. In the first phase, the outlier detection filter is designed to prevent false alarm as a pre-filter. In the second phases, two local filters and master filter are designed to detect sensor faults. In the third phases, the proposed FDI scheme checks sensor residual to isolate sensor faults and 11 EKFs actuator fault models are designed to detect wherever actuator faults occur. In the last phases, four filters are designed to identify the fault type which is either the total fault or partial fault. The developed scheme can deal with not only sensor and actuator faults, but also preventing false alarm. An important feature of the proposed FDI scheme can decreases fault isolation time and figure out not only fault detection and isolation but also fault type identification. To verify the proposed FDI algorithm performance, the Simulator is also developed under the Matlab/Simulink environment.

Damaged cable detection with statistical analysis, clustering, and deep learning models

  • Son, Hyesook;Yoon, Chanyoung;Kim, Yejin;Jang, Yun;Tran, Linh Viet;Kim, Seung-Eock;Kim, Dong Joo;Park, Jongwoong
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.17-28
    • /
    • 2022
  • The cable component of cable-stayed bridges is gradually impacted by weather conditions, vehicle loads, and material corrosion. The stayed cable is a critical load-carrying part that closely affects the operational stability of a cable-stayed bridge. Damaged cables might lead to the bridge collapse due to their tension capacity reduction. Thus, it is necessary to develop structural health monitoring (SHM) techniques that accurately identify damaged cables. In this work, a combinational identification method of three efficient techniques, including statistical analysis, clustering, and neural network models, is proposed to detect the damaged cable in a cable-stayed bridge. The measured dataset from the bridge was initially preprocessed to remove the outlier channels. Then, the theory and application of each technique for damage detection were introduced. In general, the statistical approach extracts the parameters representing the damage within time series, and the clustering approach identifies the outliers from the data signals as damaged members, while the deep learning approach uses the nonlinear data dependencies in SHM for the training model. The performance of these approaches in classifying the damaged cable was assessed, and the combinational identification method was obtained using the voting ensemble. Finally, the combination method was compared with an existing outlier detection algorithm, support vector machines (SVM). The results demonstrate that the proposed method is robust and provides higher accuracy for the damaged cable detection in the cable-stayed bridge.

V-mask Type Criterion for Identification of Outliers In Logistic Regression

  • Kim Bu-Yong
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.3
    • /
    • pp.625-634
    • /
    • 2005
  • A procedure is proposed to identify multiple outliers in the logistic regression. It detects the leverage points by means of hierarchical clustering of the robust distances based on the minimum covariance determinant estimator, and then it employs a V-mask type criterion on the scatter plot of robust residuals against robust distances to classify the observations into vertical outliers, bad leverage points, good leverage points, and regular points. Effectiveness of the proposed procedure is evaluated on the basis of the classic and artificial data sets, and it is shown that the procedure deals very well with the masking and swamping effects.

The Identification Of Multiple Outliers

  • Park, Jin-Pyo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.11 no.2
    • /
    • pp.201-215
    • /
    • 2000
  • The classical method for regression analysis is the least squares method. However, if the data contain significant outliers, the least squares estimator can be broken down by outliers. To remedy this problem, the robust methods are important complement to the least squares method. Robust methods down weighs or completely ignore the outliers. This is not always best because the outliers can contain some very important information about the population. If they can be detected, the outliers can be further inspected and appropriate action can be taken based on the results. In this paper, I propose a sequential outlier test to identify outliers. It is based on the nonrobust estimate and the robust estimate of scatter of a robust regression residuals and is applied in forward procedure, removing the most extreme data at each step, until the test fails to detect outliers. Unlike other forward procedures, the present one is unaffected by swamping or masking effects because the statistics is based on the robust regression residuals. I show the asymptotic distribution of the test statistics and apply the test to several real data and simulated data for the test to be shown to perform fairly well.

  • PDF