• 제목/요약/키워드: smoothly clipped absolute deviation penalty

검색결과 11건 처리시간 0.023초

Penalized rank regression estimator with the smoothly clipped absolute deviation function

  • Park, Jong-Tae;Jung, Kang-Mo
    • Communications for Statistical Applications and Methods
    • /
    • 제24권6호
    • /
    • pp.673-683
    • /
    • 2017
  • The least absolute shrinkage and selection operator (LASSO) has been a popular regression estimator with simultaneous variable selection. However, LASSO does not have the oracle property and its robust version is needed in the case of heavy-tailed errors or serious outliers. We propose a robust penalized regression estimator which provide a simultaneous variable selection and estimator. It is based on the rank regression and the non-convex penalty function, the smoothly clipped absolute deviation (SCAD) function which has the oracle property. The proposed method combines the robustness of the rank regression and the oracle property of the SCAD penalty. We develop an efficient algorithm to compute the proposed estimator that includes a SCAD estimate based on the local linear approximation and the tuning parameter of the penalty function. Our estimate can be obtained by the least absolute deviation method. We used an optimal tuning parameter based on the Bayesian information criterion and the cross validation method. Numerical simulation shows that the proposed estimator is robust and effective to analyze contaminated data.

Estimation and variable selection in censored regression model with smoothly clipped absolute deviation penalty

  • Shim, Jooyong;Bae, Jongsig;Seok, Kyungha
    • Journal of the Korean Data and Information Science Society
    • /
    • 제27권6호
    • /
    • pp.1653-1660
    • /
    • 2016
  • Smoothly clipped absolute deviation (SCAD) penalty is known to satisfy the desirable properties for penalty functions like as unbiasedness, sparsity and continuity. In this paper, we deal with the regression function estimation and variable selection based on SCAD penalized censored regression model. We use the local linear approximation and the iteratively reweighted least squares algorithm to solve SCAD penalized log likelihood function. The proposed method provides an efficient method for variable selection and regression function estimation. The generalized cross validation function is presented for the model selection. Applications of the proposed method are illustrated through the simulated and a real example.

Multiclass Support Vector Machines with SCAD

  • Jung, Kang-Mo
    • Communications for Statistical Applications and Methods
    • /
    • 제19권5호
    • /
    • pp.655-662
    • /
    • 2012
  • Classification is an important research field in pattern recognition with high-dimensional predictors. The support vector machine(SVM) is a penalized feature selector and classifier. It is based on the hinge loss function, the non-convex penalty function, and the smoothly clipped absolute deviation(SCAD) suggested by Fan and Li (2001). We developed the algorithm for the multiclass SVM with the SCAD penalty function using the local quadratic approximation. For multiclass problems we compared the performance of the SVM with the $L_1$, $L_2$ penalty functions and the developed method.

Weighted Support Vector Machines with the SCAD Penalty

  • Jung, Kang-Mo
    • Communications for Statistical Applications and Methods
    • /
    • 제20권6호
    • /
    • pp.481-490
    • /
    • 2013
  • Classification is an important research area as data can be easily obtained even if the number of predictors becomes huge. The support vector machine(SVM) is widely used to classify a subject into a predetermined group because it gives sound theoretical background and better performance than other methods in many applications. The SVM can be viewed as a penalized method with the hinge loss function and penalty functions. Instead of $L_2$ penalty function Fan and Li (2001) proposed the smoothly clipped absolute deviation(SCAD) satisfying good statistical properties. Despite the ability of SVMs, they have drawbacks of non-robustness when there are outliers in the data. We develop a robust SVM method using a weight function with the SCAD penalty function based on the local quadratic approximation. We compare the performance of the proposed SVM with the SVM using the $L_1$ and $L_2$ penalty functions.

Variable Selection with Nonconcave Penalty Function on Reduced-Rank Regression

  • Jung, Sang Yong;Park, Chongsun
    • Communications for Statistical Applications and Methods
    • /
    • 제22권1호
    • /
    • pp.41-54
    • /
    • 2015
  • In this article, we propose nonconcave penalties on a reduced-rank regression model to select variables and estimate coefficients simultaneously. We apply HARD (hard thresholding) and SCAD (smoothly clipped absolute deviation) symmetric penalty functions with singularities at the origin, and bounded by a constant to reduce bias. In our simulation study and real data analysis, the new method is compared with an existing variable selection method using $L_1$ penalty that exhibits competitive performance in prediction and variable selection. Instead of using only one type of penalty function, we use two or three penalty functions simultaneously and take advantages of various types of penalty functions together to select relevant predictors and estimation to improve the overall performance of model fitting.

H-likelihood approach for variable selection in gamma frailty models

  • Ha, Il-Do;Cho, Geon-Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • 제23권1호
    • /
    • pp.199-207
    • /
    • 2012
  • Recently, variable selection methods using penalized likelihood with a shrink penalty function have been widely studied in various statistical models including generalized linear models and survival models. In particular, they select important variables and estimate coefficients of covariates simultaneously. In this paper, we develop a penalize h-likelihood method for variable selection in gamma frailty models. For this we use the smoothly clipped absolute deviation (SCAD) penalty function, which satisfies a good property in variable selection. The proposed method is illustrated using simulation study and a practical data set.

Penalized variable selection for accelerated failure time models

  • Park, Eunyoung;Ha, Il Do
    • Communications for Statistical Applications and Methods
    • /
    • 제25권6호
    • /
    • pp.591-604
    • /
    • 2018
  • The accelerated failure time (AFT) model is a linear model under the log-transformation of survival time that has been introduced as a useful alternative to the proportional hazards (PH) model. In this paper we propose variable-selection procedures of fixed effects in a parametric AFT model using penalized likelihood approaches. We use three popular penalty functions, least absolute shrinkage and selection operator (LASSO), adaptive LASSO and smoothly clipped absolute deviation (SCAD). With these procedures we can select important variables and estimate the fixed effects at the same time. The performance of the proposed method is evaluated using simulation studies, including the investigation of impact of misspecifying the assumed distribution. The proposed method is illustrated with a primary biliary cirrhosis (PBC) data set.

An efficient algorithm for the non-convex penalized multinomial logistic regression

  • Kwon, Sunghoon;Kim, Dongshin;Lee, Sangin
    • Communications for Statistical Applications and Methods
    • /
    • 제27권1호
    • /
    • pp.129-140
    • /
    • 2020
  • In this paper, we introduce an efficient algorithm for the non-convex penalized multinomial logistic regression that can be uniformly applied to a class of non-convex penalties. The class includes most non-convex penalties such as the smoothly clipped absolute deviation, minimax concave and bridge penalties. The algorithm is developed based on the concave-convex procedure and modified local quadratic approximation algorithm. However, usual quadratic approximation may slow down computational speed since the dimension of the Hessian matrix depends on the number of categories of the output variable. For this issue, we use a uniform bound of the Hessian matrix in the quadratic approximation. The algorithm is available from the R package ncpen developed by the authors. Numerical studies via simulations and real data sets are provided for illustration.

Non-convex penalized estimation for the AR process

  • Na, Okyoung;Kwon, Sunghoon
    • Communications for Statistical Applications and Methods
    • /
    • 제25권5호
    • /
    • pp.453-470
    • /
    • 2018
  • We study how to distinguish the parameters of the sparse autoregressive (AR) process from zero using a non-convex penalized estimation. A class of non-convex penalties are considered that include the smoothly clipped absolute deviation and minimax concave penalties as special examples. We prove that the penalized estimators achieve some standard theoretical properties such as weak and strong oracle properties which have been proved in sparse linear regression framework. The results hold when the maximal order of the AR process increases to infinity and the minimal size of true non-zero parameters decreases toward zero as the sample size increases. Further, we construct a practical method to select tuning parameters using generalized information criterion, of which the minimizer asymptotically recovers the best theoretical non-penalized estimator of the sparse AR process. Simulation studies are given to confirm the theoretical results.

혼합회귀모형에서 콤포넌트 및 설명변수에 대한 벌점함수의 적용 (Joint penalization of components and predictors in mixture of regressions)

  • 박종선;모은비
    • 응용통계연구
    • /
    • 제32권2호
    • /
    • pp.199-211
    • /
    • 2019
  • 주어진 회귀자료에 유한혼합회귀모형을 적합하는 경우 적절한 성분의 수를 선택하고 선택된 각각의 회귀모형에서 의미있는 예측변수들의 집합을 선택하며 동시에 편의와 변동이 작은 회귀계수 추정치들을 얻는 것은 매우 중요하다. 본 연구에서는 혼합선형회귀모형에서 성분의 개수와 회귀계수에 벌점함수를 적용하여 적절한 성분의 수와 각 성분의 회귀모형에 필요한 설명변수들을 동시에 선택하는 방법을 제시하였다. 성분에 대한 벌점은 성분들의 로그값에 SCAD 벌점함수를 적용하였고 회귀계수들에는 SCAD와 더불어 MCP 및 Adplasso 벌점함수들을 사용하여 가상자료와 실제자료들에 대한 결과를 비교하였다. SCAD-SCAD 벌점함수 조합과 SCAD-MCP 조합의 경우 기존의 Luo 등 (2008)의 방법에서 문제가 되었던 과적합 문제를 해결함과 동시에 선택된 성분의 수와 회귀계수들을 효과적으로 선택하였으며 회귀계수들의 추정치에 대한 편의도 크지 않았다. 본 연구는 성분의 수가 알려져 있지 않은 회귀자료에서 적절한 성분의 수와 더불어 각 성분에 대한 회귀모형에서 모형에 필요한 예측변수들을 동시에 선택하는 방법을 제시하였다는데 의미가 있다고 하겠다.