• Title/Summary/Keyword: outliers

Search Result 655, Processing Time 0.026 seconds

Effects of Normalization and Aggregation Methods on the Volatility of Rankings and Rank Reversals (정규화 및 통합 방법이 순위의 변동성과 순위 역전에 미치는 영향)

  • Park, Youngsun
    • Journal of Korean Society for Quality Management
    • /
    • v.41 no.4
    • /
    • pp.709-724
    • /
    • 2013
  • Purpose: The purpose of this study is to examine five evaluation models constructed by different normalization and aggregation methods in terms of the volatility of rankings and rank reversals. We also explore how the volatility of rankings of the five models changes and how often the rank reversals occur when the outliers are removed. Methods: We used data published in the Complete University Guide 2014. Two universities with missing values were excluded from the data. The university rankings were derived by using the five models, and then each model's volatility of rankings was measured. The box-plot was used to detect outliers. Results: Model 1 has the lowest volatility among the five models whether or not the outliers are included. Model 5 has the lowest number of rank reversals. Model 3, which has been used by many institutions, appears to be in the middle among the five in terms of the volatility and the rank reversals. Conclusion: The university rankings vary from one evaluation model to another depending on what normalization and aggregation methods are used. No single model exhibits clear superiority over others in both the volatility and the rank reversal. The findings of this study are expected to provide a stepping stone toward a superior model which is both reliable and robust.

Multivariate Stratification under Consideration of Outliers (이상점을 고려한 다변량 층화)

  • Park, Jin-Woo;Yun, Seok-Hoon
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.3
    • /
    • pp.377-385
    • /
    • 2008
  • Most of the sample surveys conducted by several statistics preparation agencies are multipurpose surveys inquiring into several distinguishing items through a single sample. In a multipurpose sample design, the stratification tends to be very complex since the stratification variables which are both multivariate and heterogeneous must be considered collectively. In this paper we point out an outlier effect in a multivariate stratification to which the K-means clustering method is applied and propose to consider outliers prior to the stratification step. We also show an empirical stratification effect under consideration of outliers through a case study of sample design for The Rural Living Indicators.

An Outlier Detection Method in Penalized Spline Regression Models (벌점 스플라인 회귀모형에서의 이상치 탐지방법)

  • Seo, Han Son;Song, Ji Eun;Yoon, Min
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.4
    • /
    • pp.687-696
    • /
    • 2013
  • The detection and the examination of outliers are important parts of data analysis because some outliers in the data may have a detrimental effect on statistical analysis. Outlier detection methods have been discussed by many authors. In this article, we propose to apply Hadi and Simonoff's (1993) method to penalized spline a regression model to detect multiple outliers. Simulated data sets and real data sets are used to illustrate and compare the proposed procedure to a penalized spline regression and a robust penalized spline regression.

Outlier Detection in Growth Curve Model Using Mean-Shift Model (평균이동모형을 이용한 성장곡선모형의 이상점 진단에 관한 연구)

  • Shim, Kyu-Bark
    • Journal of the Korean Data and Information Science Society
    • /
    • v.10 no.2
    • /
    • pp.369-385
    • /
    • 1999
  • For the growth curve model with arbitrary covariance structure, known as unstructured covariance matrix, the problems of detecting outliers are discussed in this paper. In order to detect outliers in the growth curve model, the likelihood ratio testing statistics in mean shift model is established and its distribution is derived. After we detected outliers in growth curve model, we test homo and/or hetero-geneous covariance matrices using PSR Quasi-Bayes Criterion. For illustration, one numerical example is discussed, which compares between before and after outlier deleting.

  • PDF

Regression diagnostics for response transformations in a partial linear model (부분선형모형에서 반응변수변환을 위한 회귀진단)

  • Seo, Han Son;Yoon, Min
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.1
    • /
    • pp.33-39
    • /
    • 2013
  • In the transformation of response variable in partial linear models outliers can cause a bad effect on estimating the transformation parameter, just as in the linear models. To solve this problem the processes of estimating transformation parameter and detecting outliers are needed, but have difficulties to be performed due to the arbitrariness of the nonparametric function included in the partial linear model. In this study, through the estimation of nonparametric function and outlier detection methods such as a sequential test and a maximum trimmed likelihood estimation, processes for transforming response variable robust to outliers in partial linear models are suggested. The proposed methods are verified and compared their effectiveness by simulation study and examples.

A sequential outlier detecting method using a clustering algorithm (군집 알고리즘을 이용한 순차적 이상치 탐지법)

  • Seo, Han Son;Yoon, Min
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.4
    • /
    • pp.699-706
    • /
    • 2016
  • Outlier detection methods without performing a test often do not succeed in detecting multiple outliers because they are structurally vulnerable to a masking effect or a swamping effect. This paper considers testing procedures supplemented to a clustering-based method of identifying the group with a minority of the observations as outliers. One of general steps is performing a variety of t-test on individual outlier-candidates. This paper proposes a sequential procedure for searching for outliers by changing cutoff values on a cluster tree and performing a test on a set of outlier-candidates. The proposed method is illustrated and compared to existing methods by an example and Monte Carlo studies.

Algorithm for the Robust Estimation in Logistic Regression (로지스틱회귀모형의 로버스트 추정을 위한 알고리즘)

  • Kim, Bu-Yong;Kahng, Myung-Wook;Choi, Mi-Ae
    • The Korean Journal of Applied Statistics
    • /
    • v.20 no.3
    • /
    • pp.551-559
    • /
    • 2007
  • The maximum likelihood estimation is not robust against outliers in the logistic regression. Thus we propose an algorithm for the robust estimation, which identifies the bad leverage points and vertical outliers by the V-mask type criterion, and then strives to dampen the effect of outliers. Our main finding is that, by an appropriate selection of weights and factors, we could obtain the logistic estimates with high breakdown point. The proposed algorithm is evaluated by means of the correct classification rate on the basis of real-life and artificial data sets. The results indicate that the proposed algorithm is superior to the maximum likelihood estimation in terms of the classification.

Adaptive boosting in ensembles for outlier detection: Base learner selection and fusion via local domain competence

  • Bii, Joash Kiprotich;Rimiru, Richard;Mwangi, Ronald Waweru
    • ETRI Journal
    • /
    • v.42 no.6
    • /
    • pp.886-898
    • /
    • 2020
  • Unusual data patterns or outliers can be generated because of human errors, incorrect measurements, or malicious activities. Detecting outliers is a difficult task that requires complex ensembles. An ideal outlier detection ensemble should consider the strengths of individual base detectors while carefully combining their outputs to create a strong overall ensemble and achieve unbiased accuracy with minimal variance. Selecting and combining the outputs of dissimilar base learners is a challenging task. This paper proposes a model that utilizes heterogeneous base learners. It adaptively boosts the outcomes of preceding learners in the first phase by assigning weights and identifying high-performing learners based on their local domains, and then carefully fuses their outcomes in the second phase to improve overall accuracy. Experimental results from 10 benchmark datasets are used to train and test the proposed model. To investigate its accuracy in terms of separating outliers from inliers, the proposed model is tested and evaluated using accuracy metrics. The analyzed data are presented as crosstabs and percentages, followed by a descriptive method for synthesis and interpretation.

Robust Generalized Labeled Multi-Bernoulli Filter and Smoother for Multiple Target Tracking using Variational Bayesian

  • Li, Peng;Wang, Wenhui;Qiu, Junda;You, Congzhe;Shu, Zhenqiu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.3
    • /
    • pp.908-928
    • /
    • 2022
  • Multiple target tracking mainly focuses on tracking unknown number of targets in the complex environment of clutter and missed detection. The generalized labeled multi-Bernoulli (GLMB) filter has been shown to be an effective approach and attracted extensive attention. However, in the scenarios where the clutter rate is high or measurement-outliers often occur, the performance of the GLMB filter will significantly decline due to the Gaussian-based likelihood function is sensitive to clutter. To solve this problem, this paper presents a robust GLMB filter and smoother to improve the tracking performance in the scenarios with high clutter rate, low detection probability, and measurement-outliers. Firstly, a Student-T distribution variational Bayesian (TDVB) filtering technology is employed to update targets' states. Then, The likelihood weight in the tracking process is deduced again. Finally, a trajectory smoothing method is proposed to improve the integrative tracking performance. The proposed method are compared with recent multiple target tracking filters, and the simulation results show that the proposed method can effectively improve tracking accuracy in the scenarios with high clutter rate, low detection rate and measurement-outliers. Code is published on GitHub.

Outliers and Level Shift Detection of the Mean-sea Level, Extreme Highest and Lowest Tide Level Data (평균 해수면 및 최극조위 자료의 이상자료 및 기준고도 변화(Level Shift) 진단)

  • Lee, Gi-Seop;Cho, Hong-Yeon
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.32 no.5
    • /
    • pp.322-330
    • /
    • 2020
  • Modeling for outliers in time series was carried out using the MSL and extreme high, low tide levels (EHL, HLL) data set in the Busan and Mokpo stations. The time-series model is seasonal ARIMA model including the components of the AO (additive outliers) and LS (level shift). The optimal model was selected based on the AIC value and the model parameters were estimated using the 'tso' function (in 'tsoutliers' package of R). The main results by the model application, i.e.. outliers and level shift detections, are as follows. (1) The two AO are detected in the Busan monthly EHL data and the AO magnitudes were estimated to 65.5 cm (by typhoon MAEMI) and 29.5 cm (by typhoon SANBA), respectively. (2) The one level shift in 1983 is detected in Mokpo monthly MSL data, and the LS magnitude was estimated to 21.2 cm by the Youngsan River tidal estuary barrier construction. On the other hand, the RMS errors are computed about 1.95 cm (MSL), 5.11 cm (EHL), and 6.50 cm (ELL) in Busan station, and about 2.10 cm (MSL), 11.80 cm (EHL), and 9.14 cm (ELL) in Mokpo station, respectively.