• Title/Summary/Keyword: regression outlier

Search Result 116, Processing Time 0.02 seconds

Some efficient ratio-type exponential estimators using the Robust regression's Huber M-estimation function

  • Vinay Kumar Yadav;Shakti Prasad
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.3
    • /
    • pp.291-308
    • /
    • 2024
  • The current article discusses ratio type exponential estimators for estimating the mean of a finite population in sample surveys. The estimators uses robust regression's Huber M-estimation function, and their bias as well as mean squared error expressions are derived. It was campared with Kadilar, Candan, and Cingi (Hacet J Math Stat, 36, 181-188, 2007) estimators. The circumstances under which the suggested estimators perform better than competing estimators are discussed. Five different population datasets with a well recognized outlier have been widely used in numerical and simulation-based research. These thorough studies seek to provide strong proof to back up our claims by carefully assessing and validating the theoretical results reported in our study. The estimators that have been proposed are intended to significantly improve both the efficiency and accuracy of estimating the mean of a finite population. As a result, the results that are obtained from statistical analyses will be more reliable and precise.

Fuzzy Theil regression Model (Theil방법을 이용한 퍼지회귀모형)

  • Yoon, Jin Hee;Lee, Woo-Joo;Choi, Seung-Hoe
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.23 no.4
    • /
    • pp.366-370
    • /
    • 2013
  • Regression Analysis is an analyzing method of regression model to explain the statistical relationship between explanatory variable and response variables. This paper introduce Theil's method to find a fuzzy regression model which explain the relationship between explanatory variable and response variables. Theil's method is a robust method which is not sensive to outliers. Theil's method use medians of rate of increment based on randomly chosen pairs of each components of ${\alpha}$-level sets of fuzzy data in order to estimate the coefficients of fuzzy regression model. We propose an example to show Theil's estimator is robust than the Least squares estimator.

L-Estimation for the Parameter of the AR(l) Model (AR(1) 모형의 모수에 대한 L-추정법)

  • Han Sang Moon;Jung Byoung Cheal
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.1
    • /
    • pp.43-56
    • /
    • 2005
  • In this study, a robust estimation method for the first-order autocorrelation coefficient in the time series model following AR(l) process with additive outlier(AO) is investigated. We propose the L-type trimmed least squares estimation method using the preliminary estimator (PE) suggested by Rupport and Carroll (1980) in multiple regression model. In addition, using Mallows' weight function in order to down-weight the outlier of X-axis, the bounded-influence PE (BIPE) estimator is obtained and the mean squared error (MSE) performance of various estimators for autocorrelation coefficient are compared using Monte Carlo experiments. From the results of Monte-Carlo study, the efficiency of BIPE(LAD) estimator using the generalized-LAD to preliminary estimator performs well relative to other estimators.

V-mask Type Criterion for Identification of Outliers In Logistic Regression

  • Kim Bu-Yong
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.3
    • /
    • pp.625-634
    • /
    • 2005
  • A procedure is proposed to identify multiple outliers in the logistic regression. It detects the leverage points by means of hierarchical clustering of the robust distances based on the minimum covariance determinant estimator, and then it employs a V-mask type criterion on the scatter plot of robust residuals against robust distances to classify the observations into vertical outliers, bad leverage points, good leverage points, and regular points. Effectiveness of the proposed procedure is evaluated on the basis of the classic and artificial data sets, and it is shown that the procedure deals very well with the masking and swamping effects.

An Outlier Data Analysis using Support Vector Regression (Support Vector Regression을 이용한 이상치 데이터분석)

  • Jun, Sung-Hae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.6
    • /
    • pp.876-880
    • /
    • 2008
  • Outliers are the observations which are very larger or smaller than most observations in the given data set. These are shown by some sources. The result of the analysis with outliers may be depended on them. In general, we do data analysis after removing outliers. But, in data mining applications such as fraud detection and intrusion detection, outliers are included in training data because they have crucial information. In regression models, simple and multiple regression models need to eliminate outliers from given training data by standadized and studentized residuals to construct good model. In this paper, we use support vector regression(SVR) based on statistical teaming theory to analyze data with outliers in regression. We verify the improved performance of our work by the experiment using synthetic data sets.

Effect of outliers on the variable selection by the regularized regression

  • Jeong, Junho;Kim, Choongrak
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.2
    • /
    • pp.235-243
    • /
    • 2018
  • Many studies exist on the influence of one or few observations on estimators in a variety of statistical models under the "large n, small p" setup; however, diagnostic issues in the regression models have been rarely studied in a high dimensional setup. In the high dimensional data, the influence of observations is more serious because the sample size n is significantly less than the number variables p. Here, we investigate the influence of observations on the least absolute shrinkage and selection operator (LASSO) estimates, suggested by Tibshirani (Journal of the Royal Statistical Society, Series B, 73, 273-282, 1996), and the influence of observations on selected variables by the LASSO in the high dimensional setup. We also derived an analytic expression for the influence of the k observation on LASSO estimates in simple linear regression. Numerical studies based on artificial data and real data are done for illustration. Numerical results showed that the influence of observations on the LASSO estimates and the selected variables by the LASSO in the high dimensional setup is more severe than that in the usual "large n, small p" setup.

A Study on Developing the Acquisition Unit Cost Estimating Model of the Guided Weapon System (유도무기 획득단가 추정 모델 개발에 관한 연구)

  • Kim, Yonghyun;Lee, Yongbok;Jung, Wonil;Kim, Dongkyu;Kang, Sungjin
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.15 no.5
    • /
    • pp.565-576
    • /
    • 2012
  • Cost estimates are necessary for government acquisition program to support decisions about funding, to develop annual budget requests and to validate resource requirements at key decision points. Many researches have been done about cost estimating technique recently. Parametric cost estimating models based on CERs(Cost Estimating Relationships) have been mainly used using regression method with historical data. However, there are many restrictions in developing Korean version CERs because the number of data points are too small. Specially, data collection and data management system are unstable in Korean defense environment, when developing CERs. In this research, we analyzed the historical data, and found some cost drivers in guided weapon system area. We developed the Acquisition Unit Cost CER using the regression to remove multicollinearity in the historical data. So we could overcome the restriction of the insufficient sample number. This research as a first attempt is meaningful in terms of obtaining our own Acquisition Unit Cost CER using historical cost and physical characteristic in Korean development environment.

Reversible Data Hiding Using a Piecewise Autoregressive Predictor Based on Two-stage Embedding

  • Lee, Byeong Yong;Hwang, Hee Joon;Kim, Hyoung Joong
    • Journal of Electrical Engineering and Technology
    • /
    • v.11 no.4
    • /
    • pp.974-986
    • /
    • 2016
  • Reversible image watermarking, a type of digital data hiding, is capable of recovering the original image and extracting the hidden message with precision. A number of reversible algorithms have been proposed to achieve a high embedding capacity and a low distortion. While numerous algorithms for the achievement of a favorable performance regarding a small embedding capacity exist, the main goal of this paper is the achievement of a more favorable performance regarding a larger embedding capacity and a lower distortion. This paper therefore proposes a reversible data hiding algorithm for which a novel piecewise 2D auto-regression (P2AR) predictor that is based on a rhombus-embedding scheme is used. In addition, a minimum description length (MDL) approach is applied to remove the outlier pixels from a training set so that the effect of a multiple linear regression can be maximized. The experiment results demonstrate that the performance of the proposed method is superior to those of previous methods.

Algorithm for the Robust Estimation in Logistic Regression (로지스틱회귀모형의 로버스트 추정을 위한 알고리즘)

  • Kim, Bu-Yong;Kahng, Myung-Wook;Choi, Mi-Ae
    • The Korean Journal of Applied Statistics
    • /
    • v.20 no.3
    • /
    • pp.551-559
    • /
    • 2007
  • The maximum likelihood estimation is not robust against outliers in the logistic regression. Thus we propose an algorithm for the robust estimation, which identifies the bad leverage points and vertical outliers by the V-mask type criterion, and then strives to dampen the effect of outliers. Our main finding is that, by an appropriate selection of weights and factors, we could obtain the logistic estimates with high breakdown point. The proposed algorithm is evaluated by means of the correct classification rate on the basis of real-life and artificial data sets. The results indicate that the proposed algorithm is superior to the maximum likelihood estimation in terms of the classification.

Big Data Management in Structured Storage Based on Fintech Models for IoMT using Machine Learning Techniques (기계학습법을 이용한 IoMT 핀테크 모델을 기반으로 한 구조화 스토리지에서의 빅데이터 관리 연구)

  • Kim, Kyung-Sil
    • Advanced Industrial SCIence
    • /
    • v.1 no.1
    • /
    • pp.7-15
    • /
    • 2022
  • To adopt the development in the medical scenario IoT developed towards the advancement with the processing of a large amount of medical data defined as an Internet of Medical Things (IoMT). The vast range of collected medical data is stored in the cloud in the structured manner to process the collected healthcare data. However, it is difficult to handle the huge volume of the healthcare data so it is necessary to develop an appropriate scheme for the healthcare structured data. In this paper, a machine learning mode for processing the structured heath care data collected from the IoMT is suggested. To process the vast range of healthcare data, this paper proposed an MTGPLSTM model for the processing of the medical data. The proposed model integrates the linear regression model for the processing of healthcare information. With the developed model outlier model is implemented based on the FinTech model for the evaluation and prediction of the COVID-19 healthcare dataset collected from the IoMT. The proposed MTGPLSTM model comprises of the regression model to predict and evaluate the planning scheme for the prevention of the infection spreading. The developed model performance is evaluated based on the consideration of the different classifiers such as LR, SVR, RFR, LSTM and the proposed MTGPLSTM model and the different size of data as 1GB, 2GB and 3GB is mainly concerned. The comparative analysis expressed that the proposed MTGPLSTM model achieves ~4% reduced MAPE and RMSE value for the worldwide data; in case of china minimal MAPE value of 0.97 is achieved which is ~ 6% minimal than the existing classifier leads.