• Title/Summary/Keyword: outliers

Search Result 655, Processing Time 0.024 seconds

Outlier detection in dental research (치의학 연구에서 이상치의 처리)

  • Kim, Ki-Yeol
    • The Journal of the Korean dental association
    • /
    • v.55 no.9
    • /
    • pp.604-616
    • /
    • 2017
  • In clinical dental research, errors occur in spite of careful study design and conduct. Data cleaning procedures intend to identify and correct these errors or at least to minimize their influence on study. Outlier is the one of these errors. Outlier detection is the first step in data analysis process which has a serious effect in the field of dental research. Hence, this paper aims to introduce the methods to detect the outliers and to examine their influences in statistical data analysis.

  • PDF

Jackknife Kernel Density Estimation Using Uniform Kernel Function in the Presence of k's Unidentified Outliers

  • Woo, Jung-Soo;Lee, Jang-Choon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.6 no.1
    • /
    • pp.85-96
    • /
    • 1995
  • The purpose of this paper is to propose the kernel density estimator and the jackknife kernel density estimator in the presence of k's unidentified outliers, and to compare the small sample performances of the proposed estimators in a sense of mean integrated square error(MISE).

  • PDF

Diagnosis of Observations after Fit of Multivariate Skew t-Distribution: Identification of Outliers and Edge Observations from Asymmetric Data

  • Kim, Seung-Gu
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.6
    • /
    • pp.1019-1026
    • /
    • 2012
  • This paper presents a method for the identification of "edge observations" located on a boundary area constructed by a truncation variable as well as for the identification of outliers and the after fit of multivariate skew $t$-distribution(MST) to asymmetric data. The detection of edge observation is important in data analysis because it provides information on a certain critical area in observation space. The proposed method is applied to an Australian Institute of Sport(AIS) dataset that is well known for asymmetry in data space.

Robust Singular Value Decomposition BaLsed on Weighted Least Absolute Deviation Regression

  • Jung, Kang-Mo
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.6
    • /
    • pp.803-810
    • /
    • 2010
  • The singular value decomposition of a rectangular matrix is a basic tool to understand the structure of the data and particularly the relationship between row and column factors. However, conventional singular value decomposition used the least squares method and is not robust to outliers. We propose a simple robust singular value decomposition algorithm based on the weighted least absolute deviation which is not sensitive to leverage points. Its implementation is easy and the computation time is reasonably low. Numerical results give the data structure and the outlying information.

A Comparision of Diagnostic Measures in Linear Regression (회귀진단을 위한 새로온 척도의 제안 및 상호비교)

  • 최성운
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.15 no.25
    • /
    • pp.103-113
    • /
    • 1992
  • This paper is to study the various diagnostic measures for detecting outliers and influential cases in linear regression. In this paper we review the most common diagnostic measures and show the inter-relationships the exist among them. Based on the PRESS(Predicted REsidual Sum of Squares ) offered by Allen(1974) as a criterion for model selection, we propose three measures for detecting outliers and influential cases. Examples are given illustrating various diagnostic measures including Proposed measures.

  • PDF

Outlier Detection and Replacement for Vertical Wind Speed in the Measurement of Actual Evapotranspiration (실제증발산 측정 시 연직 풍속 이상치 탐색 및 대체)

  • Park, Chun Gun;Rim, Chang-Soo;Lim, Kwang-Suop;Chae, Hyo-Sok
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.34 no.5
    • /
    • pp.1455-1461
    • /
    • 2014
  • In this study, using flux data measured in Deokgokje reservoir watershed near Deokyu mountain in May, June, and July 2011, statistical analysis was conducted for outlier detection and replacement for vertical wind speed in the measurement of evapotranspiration based on eddy covariance method. To statistically analyze the outliers of vertical wind speed, the outlier detection method based on interquartile range (IQR) in boxplot was employed and the detected outliers were deleted or replaced with mean. The comparison was conducted for the measured evapotranspiration before and after the outlier replacement. The study results showed that there is a difference between evapotranspiration before outlier replacement and evapotranspiration after outlier replacement, especially during the rainy day. Therefore, based on the study results, the outliers should be deleted or replaced in the measurement of evapotranspiration.

Robust Location Estimation based on TDOA and FDOA using Outlier Detection Algorithm (이상치 검출 알고리즘을 이용한 TDOA와 FDOA 기반 이동 신호원 위치 추정 기법)

  • Yoo, Hogeun;Lee, Jaehoon
    • Journal of Convergence for Information Technology
    • /
    • v.10 no.9
    • /
    • pp.15-21
    • /
    • 2020
  • This paper presents the outlier detection algorithm in the estimation method of a source location and velocity based on two-step weighted least-squares method using time difference of arrival(TDOA) and frequency difference of arrival(FDOA) data. Since the accuracy of the estimated location and velocity of a moving source can be reduced by the outliers of TDOA and FDOA data, it is important to detect and remove the outliers. In this paper, the method to find the minimum inlier data and the method to determine whether TDOA and FDOA data are included in inliers or outliers are presented. The results of numerical simulations show that the accuracy of the estimated location and velocity is improved by removing the outliers of TDOA and FDOA data.

Stratification Method Using κ-Spatial Medians Clustering (κ-공간중위 군집방법을 활용한 층화방법)

  • Son, Soon-Chul;Jhun, Myoung-Shic
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.4
    • /
    • pp.677-686
    • /
    • 2009
  • Stratification of population is widely used to improve the efficiency of the estimation in a sample survey. However, it causes several problems when there are some variables containing outliers. To overcome these problems, Park and Yun (2008) proposed a rather subjective method, which finds outliers before $\kappa$-means clustering for stratification. In this study, we propose the $\kappa$-spatial medians clustering method which is more robust than $\kappa$-means clustering method and also does not need the process of finding outliers in advance. We investigate the characteristics of the proposed method through a case study used in Park and Yun (2008) and confirm the efficiency of the proposed method.

Evaluation of the Homogeneity of Korean Diagnosis Related Groups (한국형진단명기준환자군 분류체계의 동질성 평가)

  • Kim, Hyung Seon;Lee, Sun Hee;Nam, Chung Mo
    • Health Policy and Management
    • /
    • v.23 no.1
    • /
    • pp.44-51
    • /
    • 2013
  • Background: This study designed to evaluate the homogeneity of Korean diagnosis related group (KDRG) version 3.4 classification system. Methods: The total 5,921,873 claims data submitted to the Health Insurance Review and Assessment Service during 2010 were used. Both coefficient of variation (CV) and reduction in variance of cost were measured for evaluation. This analysis was divided into before and after trimming outliers at the level of adjacent DRG (ADRG), aged ADRG (AADRG) split by age, and DRG split by complication and comorbidity. Results: At the each three level of ADRG, AADRG, and DRG, there were 38.9%, 38.7%, and 30.0% of which had a CV > 100% in the untrimmed data and there were 1.4%, 1.4%, and 1.9% in the trimmed one. Before trimming outliers, ADRGs explained 52.5% of the variability in resource use, AADRGs did 53.1% and DRGs did 57.1%. The additional explanatory power by age and comorbidity and complication (CC) split were 0.6%p and 4.6%p for each, which were statistically significant. After trimming outliers, ADRGs explained 75.2% of the variability in resource use, AADRGs did 75.6%, and DRGs did 77.1%. The additional explanatory power were 0.4%p and 2.0%p for each, which were statistically significant too. Conclusion: The results demonstrated that KDRG showed high homogeneity within groups and performance after trimming outliers. But there were DRGs CV > 100% after age or CC split and the most contributing factor to high performance of KDRG was the ADRG rather than age or CC split. Therefore, it is recommended that the efforts for improving clinical homogeneity of KDRG such as review of the hierarchical structure of classification systems and classification variables.

Robust Response Transformation Using Outlier Detection in Regression Model (회귀모형에서 이상치 검색을 이용한 로버스트 변수변환방법)

  • Seo, Han-Son;Lee, Ga-Yoen;Yoon, Min
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.1
    • /
    • pp.205-213
    • /
    • 2012
  • Transforming response variable is a general tool to adapt data to a linear regression model. However, it is well known that response transformations in linear regression are very sensitive to one or a few outliers. Many methods have been suggested to develop transformations that will not be influenced by potential outliers. Recently Cheng (2005) suggested to using a trimmed likelihood estimator based on the idea of the least trimmed squares estimator(LTS). However, the method requires presetting the number of outliers and needs many computations. A new method is proposed, that can solve the problems addressed and improve the robustness of the estimates. The method uses a stepwise procedure, suggested by Hadi and Simonoff (1993), to detect outliers that determine response transformations.