• Title/Summary/Keyword: Outliers Detection/removal

Search Result 6, Processing Time 0.025 seconds

Impact of Outliers on the Statistical Measures of the Environmental Monitoring Data in Busan Coastal Sea (이상자료가 연안 환경자료의 통계 척도에 미치는 영향)

  • Cho, Hong-Yeon;Lee, Ki-Seop;Ahn, Soon-Mo
    • Ocean and Polar Research
    • /
    • v.38 no.2
    • /
    • pp.149-159
    • /
    • 2016
  • The statistical measures of the coastal environmental data are used in a variety of statistical inferences, hypothesis tests, and data-driven modeling. If the measures are biased, then the statistical estimations and models may also be biased and this potential for bias is great when data contain some outliers defined as extraordinary large or small data values. This study aims to suggest more robust statistical measures as alternatives to more commonly used measures and to assess the performance these robust measures through a quantitative evaluation of more typical measures, such as in terms of locations, spreads, and shapes, with regard to environmental monitoring data in the Busan coastal sea. The detection of outliers within the data was carried out on the basis of Rosner's test. About 5-10% of the nutrient data were found to contain outliers based on Rosner's test. After removal (zero-weighting) of the outliers in the data sets, the relative change ratios of the mean and standard deviation between before and after outlier-removal conditions revealed the figures 13 and 33%, respectively. The variation magnitudes of skewness and kurtosis are 1.36 and 8.11 in a decreasing trend, respectively. On the other hand, the change ratios for more robust measures regarding the mean and standard deviation are 3.7-10.5%, and the variation magnitudes of robust skewness and kurtosis are about only 2-4% of the magnitude of the non-robust measures. The robust measures can be regarded as outlier-resistant statistical measures based on the relatively small changes in the scenarios before and after outlier removal conditions.

Single Outlier Removal Technology for TWR based High Precision Localization (TWR 기반 고정밀 측위를 위한 단일 이상측정치 제거 기술)

  • Lee, Chang-Eun;Sung, Tae-Kyung
    • The Journal of Korea Robotics Society
    • /
    • v.12 no.3
    • /
    • pp.350-355
    • /
    • 2017
  • UWB (Ultra Wide Band) refers to a system with a bandwidth of over 500 MHz or a bandwidth of 20% of the center frequency. It is robust against channel fading and has a wide signal bandwidth. Using the IR-UWB based ranging system, it is possible to obtain decimeter-level ranging accuracy. Furthermore, IR-UWB system enables acquisition over glass or cement with high resolution. In recent years, IR-UWB-based ranging chipsets have become cheap and popular, and it has become possible to implement positioning systems of several tens of centimeters. The system can be configured as one-way ranging (OWR) positioning system for fast ranging and TWR (two-way ranging) positioning system for cheap and robust ranging. On the other hand, the ranging based positioning system has a limitation on the number of terminals for localization because it takes time to perform a communication procedure to perform ranging. To overcome this problem, code multiplexing and channel multiplexing are performed. However, errors occur in measurement due to interference between channels and code, multipath, and so on. The measurement filtering is used to reduce the measurement error, but more fundamentally, techniques for removing these measurements should be studied. First, the TWR based positioning was analyzed from a stochastic point of view and the effects of outlier measurements were summarized. The positioning algorithm for analytically identifying and removing single outlier is summarized and extended to three dimensions. Through the simulation, we have verified the algorithm to detect and remove single outliers.

A Development of Preprocessing Models of Toll Collection System Data for Travel Time Estimation (통행시간 추정을 위한 TCS 데이터의 전처리 모형 개발)

  • Lee, Hyun-Seok;NamKoong, Seong J.
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.8 no.5
    • /
    • pp.1-11
    • /
    • 2009
  • TCS Data imply characteristics of traffic conditions. However, there are outliers in TCS data, which can not represent the travel time of the pertinent section, if these outliers are not eliminated, travel time may be distorted owing to these outliers. Various travel time can be distributed under the same section and time because the variation of the travel time is increase as the section distance is increase, which make difficult to calculate the representative of travel time. Accordingly, it is important to grasp travel time characteristics in order to compute the representative of travel time using TCS Data. In this study, after analyzing the variation ratio of the travel time according to the link distance and the level of congestion, the outlier elimination model and the smoothing model for TCS data were proposed. The results show that the proposed model can be utilized for estimating a reliable travel time for a long-distance path in which there are a variation of travel times from the same departure time, the intervals are large and the change in the representative travel time is irregular for a short period.

  • PDF

An Outlier Detection Algorithm and Data Integration Technique for Prediction of Hypertension (고혈압 예측을 위한 이상치 탐지 알고리즘 및 데이터 통합 기법)

  • Khongorzul Dashdondov;Mi-Hye Kim;Mi-Hwa Song
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.417-419
    • /
    • 2023
  • Hypertension is one of the leading causes of mortality worldwide. In recent years, the incidence of hypertension has increased dramatically, not only among the elderly but also among young people. In this regard, the use of machine-learning methods to diagnose the causes of hypertension has increased in recent years. In this study, we improved the prediction of hypertension detection using Mahalanobis distance-based multivariate outlier removal using the KNHANES database from the Korean national health data and the COVID-19 dataset from Kaggle. This study was divided into two modules. Initially, the data preprocessing step used merged datasets and decision-tree classifier-based feature selection. The next module applies a predictive analysis step to remove multivariate outliers using the Mahalanobis distance from the experimental dataset and makes a prediction of hypertension. In this study, we compared the accuracy of each classification model. The best results showed that the proposed MAH_RF algorithm had an accuracy of 82.66%. The proposed method can be used not only for hypertension but also for the detection of various diseases such as stroke and cardiovascular disease.

A NEW LANDSAT IMAGE CO-REGISTRATION AND OUTLIER REMOVAL TECHNIQUES

  • Kim, Jong-Hong;Heo, Joon;Sohn, Hong-Gyoo
    • Proceedings of the KSRS Conference
    • /
    • v.2
    • /
    • pp.594-597
    • /
    • 2006
  • Image co-registration is the process of overlaying two images of the same scene. One of which is a reference image, while the other (sensed image) is geometrically transformed to the one. Numerous methods were developed for the automated image co-registration and it is known as a time-consuming and/or computation-intensive procedure. In order to improve efficiency and effectiveness of the co-registration of satellite imagery, this paper proposes a pre-qualified area matching, which is composed of feature extraction with Laplacian filter and area matching algorithm using correlation coefficient. Moreover, to improve the accuracy of co-registration, the outliers in the initial matching point should be removed. For this, two outlier detection techniques of studentized residual and modified RANSAC algorithm are used in this study. Three pairs of Landsat images were used for performance test, and the results were compared and evaluated in terms of robustness and efficiency.

  • PDF

A New Landsat Image Co-Registration and Outlier Removal Techniques

  • Kim, Jong-Hong;Heo, Joon;Sohn, Hong-Gyoo
    • Korean Journal of Remote Sensing
    • /
    • v.22 no.5
    • /
    • pp.439-443
    • /
    • 2006
  • Image co-registration is the process of overlaying two images of the same scene. One of which is a reference image, while the other (sensed image) is geometrically transformed to the one. Numerous methods were developed for the automated image co-registration and it is known as a timeconsuming and/or computation-intensive procedure. In order to improve efficiency and effectiveness of the co-registration of satellite imagery, this paper proposes a pre-qualified area matching, which is composed of feature extraction with Laplacian filter and area matching algorithm using correlation coefficient. Moreover, to improve the accuracy of co-registration, the outliers in the initial matching point should be removed. For this, two outlier detection techniques of studentized residual and modified RANSAC algorithm are used in this study. Three pairs of Landsat images were used for performance test, and the results were compared and evaluated in terms of robustness and efficiency.