• Title/Summary/Keyword: Robust Statistics

Search Result 397, Processing Time 0.022 seconds

L-Estimation for the Parameter of the AR(l) Model (AR(1) 모형의 모수에 대한 L-추정법)

  • Han Sang Moon;Jung Byoung Cheal
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.1
    • /
    • pp.43-56
    • /
    • 2005
  • In this study, a robust estimation method for the first-order autocorrelation coefficient in the time series model following AR(l) process with additive outlier(AO) is investigated. We propose the L-type trimmed least squares estimation method using the preliminary estimator (PE) suggested by Rupport and Carroll (1980) in multiple regression model. In addition, using Mallows' weight function in order to down-weight the outlier of X-axis, the bounded-influence PE (BIPE) estimator is obtained and the mean squared error (MSE) performance of various estimators for autocorrelation coefficient are compared using Monte Carlo experiments. From the results of Monte-Carlo study, the efficiency of BIPE(LAD) estimator using the generalized-LAD to preliminary estimator performs well relative to other estimators.

Comparison of the Cluster Validation Methods for High-dimensional (Gene Expression) Data (고차원 (유전자 발현) 자료에 대한 군집 타당성분석 기법의 성능 비교)

  • Jeong, Yun-Kyoung;Baek, Jang-Sun
    • The Korean Journal of Applied Statistics
    • /
    • v.20 no.1
    • /
    • pp.167-181
    • /
    • 2007
  • Many clustering algorithms and cluster validation techniques for high-dimensional gene expression data have been suggested. The evaluations of these cluster validation techniques have, however, seldom been implemented. In this paper we compared various cluster validity indices for low-dimensional simulation data and real gene expression data, and found that Dunn's index is the most effective and robust, Silhouette index is next and Davies-Bouldin index is the bottom among the internal measures. Jaccard index is much more effective than Goodman-Kruskal index and adjusted Rand index among the external measures.

A Discrete Feature Vector for Endpoint Detection of Speech with Hidden Markov Model (숨은마코프모형을 이용하는 음성 끝점 검출을 위한 이산 특징벡터)

  • Lee, Jei-Ky;Oh, Chang-Hyuck
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.6
    • /
    • pp.959-967
    • /
    • 2008
  • The purpose of this paper is to suggest a discrete feature vector, robust in various levels of noisy environment and inexpensive in computation, for detection of speech segments and is to show such properties of the feature with real speech data. The suggested feature is one dimensional vector which represents slope of short term energies and is discretized into three values to reduce computational burden of computations in HMM. In experiments with speech data, the method with the suggested feature vector showed good performance even in noisy environments.

A Test on a Specific Set of Outlier Candidates in a Linear Model (선형모형에서 특정 이상치 후보군에 대한 검정)

  • Seo, Han Son;Yoon, Min
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.2
    • /
    • pp.307-315
    • /
    • 2014
  • An exact distribution of the test statistic to test for multiple outlier candidates does not generally exist; therefore, tests of individual outliers (or tests using simulated critical-values) are usually conducted instead of testing for groups of outliers. This article is on procedures to test outlying observations. We suggest a method that can be applied to arbitrary observations or multiple outlier candidates detected by an outlier detecting method. A Monte Carlo study performance is used to compare the proposed method with others.

Hydrologic Response Estimation Using Mallows' $C_L$ Statistics (Mallows의 $C_L$ 통계량을 이용한 수문응답 추정)

  • Seong, Gi-Won;Sim, Myeong-Pil
    • Journal of Korea Water Resources Association
    • /
    • v.32 no.4
    • /
    • pp.437-445
    • /
    • 1999
  • The present paper describes the problem of hydrologic response estimation using non-parametric ridge regression method. The method adapted in this work is based on the minimization of the $C_L$ statistics, which is an estimate of the mean square prediction error. For this method, effects of using both the identity matrix and the Laplacian matrix were considered. In addition, we evaluated methods for estimating the error variance of the impulse response. As a result of analyzing synthetic and real data, a good estimation was made when the Laplacian matrix for the weighting matrix and the bias corrected estimate for the error variance were used. The method and procedure presented in present paper will play a robust and effective role on separating hydrologic response.

  • PDF

High-dimensional change point detection using MOSUM-based sparse projection (MOSUM 성근 프로젝션을 이용한 고차원 시계열의 변화점 추정)

  • Kim, Moonjung;Baek, Changryong
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.1
    • /
    • pp.63-75
    • /
    • 2022
  • This paper proposes the so-called MOSUM-based sparse projection method for change points detection in high-dimensional time series. Our method is inspired by Wang and Samworth (2018), however, our method improves their method in two ways. One is to find change points all at once, so it minimizes sequential error. The other is localized so that more robust to the mean changes offsetting each other. We also propose data-driven threshold selection using block wild bootstrap. A comprehensive simulation study shows that our method performs reasonably well in finite samples. We also illustrate our method to stock prices consisting of S&P 500 index, and found four change points in recent 6 years.

Detecting outliers in multivariate data and visualization-R scripts (다변량 자료에서 특이점 검출 및 시각화 - R 스크립트)

  • Kim, Sung-Soo
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.4
    • /
    • pp.517-528
    • /
    • 2018
  • We provide R scripts to detect outliers in multivariate data and visualization. Detecting outliers is provided using three approaches 1) Robust Mahalanobis distance, 2) High Dimensional data, 3) density-based approach methods. We use the following techniques to visualize detected potential outliers 1) multidimensional scaling (MDS) and minimal spanning tree (MST) with k-means clustering, 2) MDS with fviz cluster, 3) principal component analysis (PCA) with fviz cluster. For real data sets, we use MLB pitching data including Ryu, Hyun-jin in 2013 and 2014. The developed R scripts can be downloaded at "http://www.knou.ac.kr/~sskim/ddpoutlier.html" (R scripts and also R package can be downloaded here).

Korean women wage analysis using selection models (표본 선택 모형을 이용한 국내 여성 임금 데이터 분석)

  • Jeong, Mi Ryang;Kim, Mijeong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.5
    • /
    • pp.1077-1085
    • /
    • 2017
  • In this study, we have found the major factors which affect Korean women's wage analysing the data provided by 2015 Korea Labor Panel Survey (KLIPS). In general, wage data is difficult to analyze because random sampling is infeasible. Heckman sample selection model is the most widely used method for analysing the data with sample selection. Heckman proposed two kinds of selection models: the one is the model with maximum likelihood method and the other is the Heckman two stage model. Heckman two stage model is known to be robust to the normal assumption of bivariate error terms. Recently, Marchenko and Genton (2012) proposed the Heckman selectiont model which generalizes the Heckman two stage model and concluded that Heckman selection-t model is more robust to the error assumptions. Employing the two models, we carried out the analysis of the data and we compared those results.

A Robust Test for Location Parameters in Multivariate Data (다변량 자료에서 위치모수에 대한 로버스트 검정)

  • So, Sun-Ha;Lee, Dong-Hee;Jung, Byoung-Cheo
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.6
    • /
    • pp.1355-1364
    • /
    • 2009
  • This work propose a robust test for location parameters in multivariate data based on MVE and MCD with the affine equivariance and the high-breakdown properties. We consider the hypothesis testing satisfying high efficiency and high test power simultaneously to bring in the one-step reweighting procedure upon high-breakdown estimators, which generally suffer from the low efficiency and, as a result, usually used only in the exploratory analysis. Monte Carlo study shows that the suggested method retains nominal significance levels and higher testing power without regard to various population distributions than a Hotelling's $T^2$ test. In an example, a data set containing known outliers does not make an influence toward our proposal, while it renders a Hotelling's $T^2$ useless.

Robust Person Identification Using Optimal Reliability in Audio-Visual Information Fusion

  • Tariquzzaman, Md.;Kim, Jin-Young;Na, Seung-You;Choi, Seung-Ho
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.3E
    • /
    • pp.109-117
    • /
    • 2009
  • Identity recognition in real environment with a reliable mode is a key issue in human computer interaction (HCI). In this paper, we present a robust person identification system considering score-based optimal reliability measure of audio-visual modalities. We propose an extension of the modified reliability function by introducing optimizing parameters for both of audio and visual modalities. For degradation of visual signals, we have applied JPEG compression to test images. In addition, for creating mismatch in between enrollment and test session, acoustic Babble noises and artificial illumination have been added to test audio and visual signals, respectively. Local PCA has been used on both modalities to reduce the dimension of feature vector. We have applied a swarm intelligence algorithm, i.e., particle swarm optimization for optimizing the modified convection function's optimizing parameters. The overall person identification experiments are performed using VidTimit DB. Experimental results show that our proposed optimal reliability measures have effectively enhanced the identification accuracy of 7.73% and 8.18% at different illumination direction to visual signal and consequent Babble noises to audio signal, respectively, in comparison with the best classifier system in the fusion system and maintained the modality reliability statistics in terms of its performance; it thus verified the consistency of the proposed extension.