• Title/Summary/Keyword: Outlier handling

Search Result 12, Processing Time 0.032 seconds

Outlier Tests in Sample Surveys

  • Namkyung, Pyong;Lee, Joon Suk
    • Communications for Statistical Applications and Methods
    • /
    • v.7 no.2
    • /
    • pp.447-456
    • /
    • 2000
  • In this paper, we considered three methods for outlier identification sample surveys. First, we studied method of handling and adjusting outliers in normal population. Second, we studied existing methods using mean, maximum and minimum and proposed a test using of median which well reflects characteristic of data regardless of sampling distribution. Finally, we showed our test using median works better than Dixon and mean test through simulation.

  • PDF

A Suggestion to Establish Statistical Treatment Guideline for Aircraft Manufacturer (국산 복합재료 시험데이터 처리지침 수립을 위한 제언)

  • Suh, Jangwon
    • Journal of Aerospace System Engineering
    • /
    • v.8 no.4
    • /
    • pp.39-43
    • /
    • 2014
  • This paper examines the statistical process that should be performed with caution in the composite material qualification and equivalency process, and describes statistically significant considerations on outlier finding and handling process, data pooling through normalization process, review for data distributions and design allowables determination process for structural analysis. Based on these considerations, the need for guidance on statistical process for aircraft manufacturers who use the composite material properties database are proposed.

The Effect of Outliers in Regression Analysis (회귀 분석에서 이상치가 미치는 영향)

  • Kim, Kwang-Soo;Bae, Young-Ju;Lee, Jin-Gue
    • Journal of Korean Society for Quality Management
    • /
    • v.24 no.2
    • /
    • pp.158-171
    • /
    • 1996
  • Outlier is one that appears to deviate extremely from other data in collected data. Thus treatment of outlier is very important work, because it is to distort the meaning of whole data in its analysis and to reduce the accuracy and validity for adequate models. The aim of this paper is to present some ways of handling outliers in given data and to investigate the effect of the analysis result before and after outlier reject. As a variety of methods has been proposed, we sellect the linear regression analysis and two linear programming techniques and compare to each result.

  • PDF

Skew Normal Boxplot and Outliers

  • Huh, Myung-Hoe;Lee, Yong-Goo
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.4
    • /
    • pp.591-595
    • /
    • 2012
  • We frequently use Tukey's boxplot to identify outliers in the batch of observations of the continuous variable. In doing so, we implicitly assume that the underlying distribution belongs to the family of normal distributions. Such a practice of data handling is often superficial and improper, since in reality too many variables manifest the skewness. In this short paper, we build a modified boxplot and set the outlier identification procedure by assuming that the observations are generated from the skew normal distribution (Azzalini, 1985), which is an extension of the normal distribution. Statistical performance of the proposed procedure is examined with simulated datasets.

The Development of Biodegradable Fiber Tensile Tenacity and Elongation Prediction Model Considering Data Imbalance and Measurement Error (데이터 불균형과 측정 오차를 고려한 생분해성 섬유 인장 강신도 예측 모델 개발)

  • Se-Chan, Park;Deok-Yeop, Kim;Kang-Bok, Seo;Woo-Jin, Lee
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.12
    • /
    • pp.489-498
    • /
    • 2022
  • Recently, the textile industry, which is labor-intensive, is attempting to reduce process costs and optimize quality through artificial intelligence. However, the fiber spinning process has a high cost for data collection and lacks a systematic data collection and processing system, so the amount of accumulated data is small. In addition, data imbalance occurs by preferentially collecting only data with changes in specific variables according to the purpose of fiber spinning, and there is an error even between samples collected under the same fiber spinning conditions due to difference in the measurement environment of physical properties. If these data characteristics are not taken into account and used for AI models, problems such as overfitting and performance degradation may occur. Therefore, in this paper, we propose an outlier handling technique and data augmentation technique considering the characteristics of the spinning process data. And, by comparing it with the existing outlier handling technique and data augmentation technique, it is shown that the proposed technique is more suitable for spinning process data. In addition, by comparing the original data and the data processed with the proposed method to various models, it is shown that the performance of the tensile tenacity and elongation prediction model is improved in the models using the proposed methods compared to the models not using the proposed methods.

Offline In-Hand 3D Modeling System Using Automatic Hand Removal and Improved Registration Method (자동 손 제거와 개선된 정합방법을 이용한 오프라인 인 핸드 3D 모델링 시스템)

  • Kang, Junseok;Yang, Hyeonseok;Lim, Hwasup;Ahn, Sang Chul
    • Journal of the HCI Society of Korea
    • /
    • v.12 no.3
    • /
    • pp.13-23
    • /
    • 2017
  • In this paper, we propose a new in-hand 3D modeling system that improves user convenience. Since traditional modeling systems are inconvenient to use, an in-hand modeling system has been studied, where an object is handled by hand. However, there is also a problem that it requires additional equipment or specific constraints to remove hands for good modeling. In this paper, we propose a contact state change detection algorithm for automatic hand removal and improved ICP algorithm that enables outlier handling and additionally uses color for accurate registration. The proposed algorithm enables accurate modeling without additional equipment or any constraints. Through experiments using real data, we show that it is possible to accomplish accurate modeling under the general conditions without any constraint by using the proposed system.

Analysis of Outlier Effects on Spatial Indices

  • Kim Si-Wan;Kim Kyoung-Sook;Li Ki-Joune
    • Spatial Information Research
    • /
    • v.12 no.4 s.31
    • /
    • pp.339-349
    • /
    • 2004
  • Outliers in spatial databases uuluence on the performance of spatial indexing methods including R-tree. They enlarge the size and overlapping area of MBRs in R-tree which are important factors in determining the performance. In this paper, we give an analysis of outlier effects on R-tree by analytical and experimental work, and propose a method for properly handling outliers. Our experimental results show that our method improves about 15 percents of the performance.

  • PDF

Space Time Data Analysis for Greenhouse Whitefly (온실가루이의 공간시계열 분석)

  • 박진모;신기일
    • The Korean Journal of Applied Statistics
    • /
    • v.17 no.3
    • /
    • pp.403-418
    • /
    • 2004
  • Recently space-time model in spatial data analysis is widly used. In this paper we applied this model to analysis of greenhouse whitefly. For handling time component, we used ARMA model and autoregressive error model and for outliers, we adapted Mugglestone's method. We compared space-time models and geostatistic model with MSE and MAPE.

Robust HDR Image Reconstruction via Outlier Handling (아웃라이어 처리를 통한 강인한 HDR 영상 복원 방법)

  • Cho, Ho-Jin;Lee, Seung-Yong
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06c
    • /
    • pp.317-319
    • /
    • 2012
  • 본 논문에서는 아웃라이어 처리를 통한 강인한 HDR 영상 복원 방법을 제시한다. 기존의 방법들은 LDR 영상들에서 흔히 발생하는 긴 노출시간으로 인한 블러 현상이나 저노출/과노출로 인한 포화 픽셀(아웃라이어)을 고려하지 않았다. 본 논문이 제시하는 방법은 MAP(Maximum a priori)을 이용하여 블러 및 아웃라이어를 반영하여 HDR 영상 복원 문제를 정확히 모델링하고, 블러 추정 및 EM(Expectation-Maximization) 알고리즘 기반의 아웃라이어 추정을 통해 품질 저하가 없는 선명한 HDR 영상을 복원한다. 실험 결과를 통해 본 논문이 제시하는 방법이 블러 및 아웃라이어를 포함하는 LDR 영상들로부터 우수한 품질의 HDR 영상을 효과적으로 복원할 수 있음을 보이며, 최근에 개발된 방법들과 비교해서도 더 우수한 품질을 갖는 것을 볼 수 있다.

Diabetes prediction mechanism using machine learning model based on patient IQR outlier and correlation coefficient (환자 IQR 이상치와 상관계수 기반의 머신러닝 모델을 이용한 당뇨병 예측 메커니즘)

  • Jung, Juho;Lee, Naeun;Kim, Sumin;Seo, Gaeun;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.10
    • /
    • pp.1296-1301
    • /
    • 2021
  • With the recent increase in diabetes incidence worldwide, research has been conducted to predict diabetes through various machine learning and deep learning technologies. In this work, we present a model for predicting diabetes using machine learning techniques with German Frankfurt Hospital data. We apply outlier handling using Interquartile Range (IQR) techniques and Pearson correlation and compare model-specific diabetes prediction performance with Decision Tree, Random Forest, Knn (k-nearest neighbor), SVM (support vector machine), Bayesian Network, ensemble techniques XGBoost, Voting, and Stacking. As a result of the study, the XGBoost technique showed the best performance with 97% accuracy on top of the various scenarios. Therefore, this study is meaningful in that the model can be used to accurately predict and prevent diabetes prevalent in modern society.