• Title/Summary/Keyword: regression outlier

Search Result 116, Processing Time 0.024 seconds

On Confidence Intervals of High Breakdown Regression Estimators

  • Lee Dong-Hee;Park YouSung;Kim Kang-yong
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2004년도 학술발표논문집
    • /
    • pp.205-210
    • /
    • 2004
  • A weighted self-tuning robust regression estimator (WSTE) has the high breakdown point for estimating regression parameters such as other well known high breakdown estimators. In this paper, we propose to obtain standard quantities like confidence intervals, and it is found to be superior to the other high breakdown regression estimators when a sample is contaminated

  • PDF

SOURCES OF HIGH LEVERAGE IN LINEAR REGRESSION MODEL

  • Kim, Myung-Geun
    • Journal of applied mathematics & informatics
    • /
    • 제16권1_2호
    • /
    • pp.509-513
    • /
    • 2004
  • Some reasons for high leverage are analytically investigated by decomposing leverage into meaningful components. The results in this work can be used for remedial action as a next step of data analysis.

Robust Estimation and Outlier Detection

  • Myung Geun Kim
    • Communications for Statistical Applications and Methods
    • /
    • 제1권1호
    • /
    • pp.33-40
    • /
    • 1994
  • The conditional expectation of a random variable in a multivariate normal random vector is a multiple linear regression on its predecessors. Using this fact, the least median of squares estimation method developed in a multiple linear regression is adapted to a multivariate data to identify influential observations. The resulting method clearly detect outliers and it avoids the masking effect.

  • PDF

A Study on Applications of Regression Diagnostic Method to Technometrics, and the Statistical Quality Control

  • Kim, Soon-Kwi
    • 품질경영학회지
    • /
    • 제21권1호
    • /
    • pp.55-64
    • /
    • 1993
  • This article is concerned with procedures for detecting one or more outliers or influential observations in a linear regression model. A test procedure, based on recursive residuals is proposed and developed The power of the test procedure to identify one or more outliers is investigated through simulation, and its relevance to the number and configuration of the outlier.

  • PDF

A New Forest Fire Detection Algorithm using Outlier Detection Method on Regression Analysis between Surface temperature and NDVI

  • Huh, Yong;Byun, Young-Gi;Son, Jeong-Hoon;Yu, Ki-Yun;Kim, Yong-Il
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2006년도 Proceedings of ISRS 2006 PORSEC Volume II
    • /
    • pp.574-577
    • /
    • 2006
  • In this paper, we developed a forest fire detection algorithm which uses a regression function between NDVI and land surface temperature. Previous detection algorithms use the land surface temperature as a main factor to discriminate fire pixels from non-fire pixels. These algorithms assume that the surface temperatures of non-fire pixels are intrinsically analogous and obey Gaussian normal distribution, regardless of land surface types and conditions. And the temperature thresholds for detecting fire pixels are derived from the statistical distribution of non-fire pixels’ temperature using heuristic methods. This assumption makes the temperature distribution of non-fire pixels very diverse and sometimes slightly overlapped with that of fire pixel. So, sometimes there occur omission errors in the cases of small fires. To ease such problem somewhat, we separated non-fire pixels into each land cover type by clustering algorithm and calculated the residuals between the temperature of a pixel under examination whether fire pixel or not and estimated temperature of the pixel using the linear regression between surface temperature and NDVI. As a result, this algorithm could modify the temperature threshold considering land types and conditions and showed improved detection accuracy.

  • PDF

선형회귀에서 변수선택, 변수변환과 이상치 탐지의 동시적 수행을 위한 절차 (A procedure for simultaneous variable selection, variable transformation and outlier identification in linear regression)

  • 서한손;윤민
    • 응용통계연구
    • /
    • 제33권1호
    • /
    • pp.1-10
    • /
    • 2020
  • 본 연구에서는 선형회귀모형에서 이상치와 변수변환을 고려한 변수선택 알고리즘을 다룬다. 제안된 방법은 잠재적 이상치를 탐지하여 제거한 후 변수변환 추정을 위해 최소 절사 제곱 추정법을 적용하며 가능한 모든 회귀모형을 비교하여 최종적으로 변수를 선택한다. 정확한 변수 선택과 추정된 모델의 적합도의 맥락에서 방법의 효율성을 보여주기 위해 실제 데이터 분석 및 시뮬레이션 결과가 제시된다.

Machine learning application to seismic site classification prediction model using Horizontal-to-Vertical Spectral Ratio (HVSR) of strong-ground motions

  • Francis G. Phi;Bumsu Cho;Jungeun Kim;Hyungik Cho;Yun Wook Choo;Dookie Kim;Inhi Kim
    • Geomechanics and Engineering
    • /
    • 제37권6호
    • /
    • pp.539-554
    • /
    • 2024
  • This study explores development of prediction model for seismic site classification through the integration of machine learning techniques with horizontal-to-vertical spectral ratio (HVSR) methodologies. To improve model accuracy, the research employs outlier detection methods and, synthetic minority over-sampling technique (SMOTE) for data balance, and evaluates using seven machine learning models using seismic data from KiK-net. Notably, light gradient boosting method (LGBM), gradient boosting, and decision tree models exhibit improved performance when coupled with SMOTE, while Multiple linear regression (MLR) and Support vector machine (SVM) models show reduced efficacy. Outlier detection techniques significantly enhance accuracy, particularly for LGBM, gradient boosting, and voting boosting. The ensemble of LGBM with the isolation forest and SMOTE achieves the highest accuracy of 0.91, with LGBM and local outlier factor yielding the highest F1-score of 0.79. Consistently outperforming other models, LGBM proves most efficient for seismic site classification when supported by appropriate preprocessing procedures. These findings show the significance of outlier detection and data balancing for precise seismic soil classification prediction, offering insights and highlighting the potential of machine learning in optimizing site classification accuracy.

주기 패턴을 이용한 센서 네트워크 데이터의 이상치 예측 (Outlier prediction in sensor network data using periodic pattern)

  • 김형일
    • 센서학회지
    • /
    • 제15권6호
    • /
    • pp.433-441
    • /
    • 2006
  • Because of the low power and low rate of a sensor network, outlier is frequently occurred in the time series data of sensor network. In this paper, we suggest periodic pattern analysis that is applied to the time series data of sensor network and predict outlier that exist in the time series data of sensor network. A periodic pattern is minimum period of time in which trend of values in data is appeared continuous and repeated. In this paper, a quantization and smoothing is applied to the time series data in order to analyze the periodic pattern and the fluctuation of each adjacent value in the smoothed data is measured to be modified to a simple data. Then, the periodic pattern is abstracted from the modified simple data, and the time series data is restructured according to the periods to produce periodic pattern data. In the experiment, the machine learning is applied to the periodic pattern data to predict outlier to see the results. The characteristics of analysis of the periodic pattern in this paper is not analyzing the periods according to the size of value of data but to analyze time periods according to the fluctuation of the value of data. Therefore analysis of periodic pattern is robust to outlier. Also it is possible to express values of time attribute as values in time period by restructuring the time series data into periodic pattern. Thus, it is possible to use time attribute even in the general machine learning algorithm in which the time series data is not possible to be learned.

모바일 기기에서 이상치 데이터 처리 정책에 따른 배터리 잔여 시간 예측 기법의 평가 (Performance Evaluation of Battery Remaining Time Estimation Methods According to Outlier Data Processing Policies in Mobile Devices)

  • 탁성우
    • 한국정보통신학회논문지
    • /
    • 제26권7호
    • /
    • pp.1078-1090
    • /
    • 2022
  • 모바일 기기 배터리의 잔여 시간 예측은 배터리 잔량별 사용 시간 데이터의 분포 특성에 영향을 받는다. 특히 이상치 데이터가 존재하는 경우, 통계적 회귀 기법의 예측 성능을 왜곡시킬 수 있다. 이에 본 논문에서는 통계적 회귀 기법의 예측 성능 향상을 위해 이상치 데이터를 탐지 및 처리하는 프레임워크를 제안하였다. 제안한 프레임워크는 먼저 배터리 잔여 시간 예측에 영향을 주는 이상치 데이터를 탐지한다. 탐지된 이상치 데이터는 평활 과정을 통해 새로운 값으로 치환된 후, 이상치 데이터와 치환된 데이터 간의 차이를 개별 데이터에 분배한다. 마지막으로 개별 데이터를 재강화하여 예측 성능을 향상시키고자 한다. 제안한 프레임워크의 성능 분석을 수행한 결과, 배터리 잔여 시간의 예측 성능이 향상됨을 확인하였다.

ROBUST REGRESSION ESTIMATION BASED ON DATA PARTITIONING

  • Lee, Dong-Hee;Park, You-Sung
    • Journal of the Korean Statistical Society
    • /
    • 제36권2호
    • /
    • pp.299-320
    • /
    • 2007
  • We introduce a high breakdown point estimator referred to as data partitioning robust regression estimator (DPR). Since the DPR is obtained by partitioning observations into a finite number of subsets, it has no computational problem unlike the previous robust regression estimators. Empirical and extensive simulation studies show that the DPR is superior to the previous robust estimators. This is much so in large samples.