• 제목/요약/키워드: Regression method

검색결과 7,305건 처리시간 0.032초

대형 데이터에서 VIF회귀를 이용한 신속 강건 변수선택법 (Fast robust variable selection using VIF regression in large datasets)

  • 서한손
    • 응용통계연구
    • /
    • 제31권4호
    • /
    • pp.463-473
    • /
    • 2018
  • 연구에서는 선형회귀모형을 가정한 대형 데이터에서의 변수선택 알고리즘을 다룬다. 방법의 속도와 강건성에 주안점을 둔 여러 알고리즘들이 제안되었다. 그 중에서 streamwise 회귀 접근법을 사용한 VIF회귀는 신속하고 정확하게 수행된다. 그러나 VIF회귀는 최소제곱방법에 의해 모형이 추정되므로 이상치에 민감하다. 변수선택방법의 강건성을 높이기 위해 가중 추정치를 사용한 강건측도가 제안되었으며 강건 VIF회귀도 제안되었다. 본 연구에서는 잠재적 이상치를 탐지하여 제거한 후 VIF회귀를 수행하는, 빠르고 강건한 변수선택 방법을 제안한다. 제안된 방법은 모의실험과 데이터 분석 통해 다른 방법들과 비교된다.

Robustness of model averaging methods for the violation of standard linear regression assumptions

  • Lee, Yongsu;Song, Juwon
    • Communications for Statistical Applications and Methods
    • /
    • 제28권2호
    • /
    • pp.189-204
    • /
    • 2021
  • In a regression analysis, a single best model is usually selected among several candidate models. However, it is often useful to combine several candidate models to achieve better performance, especially, in the prediction viewpoint. Model combining methods such as stacking and Bayesian model averaging (BMA) have been suggested from the perspective of averaging candidate models. When the candidate models include a true model, it is expected that BMA generally gives better performance than stacking. On the other hand, when candidate models do not include the true model, it is known that stacking outperforms BMA. Since stacking and BMA approaches have different properties, it is difficult to determine which method is more appropriate under other situations. In particular, it is not easy to find research papers that compare stacking and BMA when regression model assumptions are violated. Therefore, in the paper, we compare the performance among model averaging methods as well as a single best model in the linear regression analysis when standard linear regression assumptions are violated. Simulations were conducted to compare model averaging methods with the linear regression when data include outliers and data do not include them. We also compared them when data include errors from a non-normal distribution. The model averaging methods were applied to the water pollution data, which have a strong multicollinearity among variables. Simulation studies showed that the stacking method tends to give better performance than BMA or standard linear regression analysis (including the stepwise selection method) in the sense of risks (see (3.1)) or prediction error (see (3.2)) when typical linear regression assumptions are violated.

On study for change point regression problems using a difference-based regression model

  • Park, Jong Suk;Park, Chun Gun;Lee, Kyeong Eun
    • Communications for Statistical Applications and Methods
    • /
    • 제26권6호
    • /
    • pp.539-556
    • /
    • 2019
  • This paper derive a method to solve change point regression problems via a process for obtaining consequential results using properties of a difference-based intercept estimator first introduced by Park and Kim (Communications in Statistics - Theory Methods, 2019) for outlier detection in multiple linear regression models. We describe the statistical properties of the difference-based regression model in a piecewise simple linear regression model and then propose an efficient algorithm for change point detection. We illustrate the merits of our proposed method in the light of comparison with several existing methods under simulation studies and real data analysis. This methodology is quite valuable, "no matter what regression lines" and "no matter what the number of change points".

인체변수의 계층적 추정기법 개발 및 적용 (Development and application of a hierarchical estimation method for anthropometric variables)

  • 류태범;유희천
    • 대한인간공학회지
    • /
    • 제22권4호
    • /
    • pp.59-78
    • /
    • 2003
  • Most regression models of anthropometric variables use stature and/or weight as regressors; however, these 'flat' regression models result in large errors for anthropometric variables having low correlations with the regressors. To develop more accurate regression models for anthropometric variables, this study proposed a method to estimate anthropometric variables in a hierarchical manner based on the relationships among the variables and a process to develop and improve corresponding regression models. By applying the proposed approach, a hierarchical estimation structure was constructed for 59 anthropometric variables selected for the occupant package design of a passenger car and corresponding regression models were developed with the 1988 US Army anthropometric survey data. The hierarchical regression models were compared with the corresponding flat regression models in terms of accuracy. As results, the standard errors of the hierarchical regression models decreased by 28% (4.3mm) on average compared with those of the flat models.

Nonparametric Kernel Regression Function Estimation with Bootstrap Method

  • Kim, Dae-Hak
    • Journal of the Korean Statistical Society
    • /
    • 제22권2호
    • /
    • pp.361-368
    • /
    • 1993
  • In recent years, kernel type estimates are abundant. In this paper, we propose a bandwidth selection method for kernel regression of fixed design based on bootstrap procedure. Mathematical properties of proposed bootstrap-based bandwidth selection method are discussed. Performance of the proposed method for small sample case is compared with that of cross-validation method via a simulation study.

  • PDF

영상 디블러링에서의 임의 잡음 제거를 위한 로지스틱 회귀 (A Logistic Regression for Random Noise Removal in Image Deblurring)

  • 이남용
    • 한국멀티미디어학회논문지
    • /
    • 제20권10호
    • /
    • pp.1671-1677
    • /
    • 2017
  • In this paper, we propose a machine learning method for random noise removal in image deblurring. The proposed method uses a logistic regression to select reliable data to use them, and, at the same time, to exclude data, which seem to be corrupted by random noise, in the deblurring process. The proposed method uses commonly available images as training data. Simulation results show an improved performance of the proposed method, as compared with the median filtering based reliable data selection method.

LEAST ABSOLUTE DEVIATION ESTIMATOR IN FUZZY REGRESSION

  • KIM KYUNG JOONG;KIM DONG HO;CHOI SEUNG HOE
    • Journal of applied mathematics & informatics
    • /
    • 제18권1_2호
    • /
    • pp.649-656
    • /
    • 2005
  • In this paper we consider a fuzzy least absolute deviation method in order to construct fuzzy linear regression model with fuzzy input and fuzzy output. We also consider two numerical examples to evaluate an effectiveness of the fuzzy least absolute deviation method and the fuzzy least squares method.

Censored varying coefficient regression model using Buckley-James method

  • Shim, Jooyong;Seok, Kyungha
    • Journal of the Korean Data and Information Science Society
    • /
    • 제28권5호
    • /
    • pp.1167-1177
    • /
    • 2017
  • The censored regression using the pseudo-response variable proposed by Buckley and James has been one of the most well-known models. Recently, the varying coefficient regression model has received a great deal of attention as an important tool for modeling. In this paper we propose a censored varying coefficient regression model using Buckley-James method to consider situations where the regression coefficients of the model are not constant but change as the smoothing variables change. By using the formulation of least squares support vector machine (LS-SVM), the coefficient estimators of the proposed model can be easily obtained from simple linear equations. Furthermore, a generalized cross validation function can be easily derived. In this paper, we evaluated the proposed method and demonstrated the adequacy through simulate data sets and real data sets.

Penalized rank regression estimator with the smoothly clipped absolute deviation function

  • Park, Jong-Tae;Jung, Kang-Mo
    • Communications for Statistical Applications and Methods
    • /
    • 제24권6호
    • /
    • pp.673-683
    • /
    • 2017
  • The least absolute shrinkage and selection operator (LASSO) has been a popular regression estimator with simultaneous variable selection. However, LASSO does not have the oracle property and its robust version is needed in the case of heavy-tailed errors or serious outliers. We propose a robust penalized regression estimator which provide a simultaneous variable selection and estimator. It is based on the rank regression and the non-convex penalty function, the smoothly clipped absolute deviation (SCAD) function which has the oracle property. The proposed method combines the robustness of the rank regression and the oracle property of the SCAD penalty. We develop an efficient algorithm to compute the proposed estimator that includes a SCAD estimate based on the local linear approximation and the tuning parameter of the penalty function. Our estimate can be obtained by the least absolute deviation method. We used an optimal tuning parameter based on the Bayesian information criterion and the cross validation method. Numerical simulation shows that the proposed estimator is robust and effective to analyze contaminated data.

EVALUATION OF PARAMETER ESTIMATION METHODS FOR NONLINEAR TIME SERIES REGRESSION MODELS

  • Kim, Tae-Soo;Ahn, Jung-Ho
    • Journal of applied mathematics & informatics
    • /
    • 제27권1_2호
    • /
    • pp.315-326
    • /
    • 2009
  • The unknown parameters in regression models are usually estimated by using various existing methods. There are several existing methods, such as the least squares method, which is the most common one, the least absolute deviation method, the regression quantile method, and the asymmetric least squares method. For the nonlinear time series regression models, which do not satisfy the general conditions, we will compare them in two ways: 1) a theoretical comparison in the asymptotic sense and 2) an empirical comparison using Monte Carlo simulation for a small sample size.

  • PDF