• Title/Summary/Keyword: least absolute shrinkage and selection operator

Search Result 49, Processing Time 0.03 seconds

How to improve oil consumption forecast using google trends from online big data?: the structured regularization methods for large vector autoregressive model

  • Choi, Ji-Eun;Shin, Dong Wan
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.1
    • /
    • pp.41-51
    • /
    • 2022
  • We forecast the US oil consumption level taking advantage of google trends. The google trends are the search volumes of the specific search terms that people search on google. We focus on whether proper selection of google trend terms leads to an improvement in forecast performance for oil consumption. As the forecast models, we consider the least absolute shrinkage and selection operator (LASSO) regression and the structured regularization method for large vector autoregressive (VAR-L) model of Nicholson et al. (2017), which select automatically the google trend terms and the lags of the predictors. An out-of-sample forecast comparison reveals that reducing the high dimensional google trend data set to a low-dimensional data set by the LASSO and the VAR-L models produces better forecast performance for oil consumption compared to the frequently-used forecast models such as the autoregressive model, the autoregressive distributed lag model and the vector error correction model.

Weighted Least Absolute Deviation Lasso Estimator

  • Jung, Kang-Mo
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.6
    • /
    • pp.733-739
    • /
    • 2011
  • The linear absolute shrinkage and selection operator(Lasso) method improves the low prediction accuracy and poor interpretation of the ordinary least squares(OLS) estimate through the use of $L_1$ regularization on the regression coefficients. However, the Lasso is not robust to outliers, because the Lasso method minimizes the sum of squared residual errors. Even though the least absolute deviation(LAD) estimator is an alternative to the OLS estimate, it is sensitive to leverage points. We propose a robust Lasso estimator that is not sensitive to outliers, heavy-tailed errors or leverage points.

Cox proportional hazard model with L1 penalty

  • Hwang, Chang-Ha;Shim, Joo-Yong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.3
    • /
    • pp.613-618
    • /
    • 2011
  • The proposed method is based on a penalized log partial likelihood of Cox proportional hazard model with L1-penalty. We use the iteratively reweighted least squares procedure to solve L1 penalized log partial likelihood function of Cox proportional hazard model. It provide the ecient computation including variable selection and leads to the generalized cross validation function for the model selection. Experimental results are then presented to indicate the performance of the proposed procedure.

Penalized rank regression estimator with the smoothly clipped absolute deviation function

  • Park, Jong-Tae;Jung, Kang-Mo
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.6
    • /
    • pp.673-683
    • /
    • 2017
  • The least absolute shrinkage and selection operator (LASSO) has been a popular regression estimator with simultaneous variable selection. However, LASSO does not have the oracle property and its robust version is needed in the case of heavy-tailed errors or serious outliers. We propose a robust penalized regression estimator which provide a simultaneous variable selection and estimator. It is based on the rank regression and the non-convex penalty function, the smoothly clipped absolute deviation (SCAD) function which has the oracle property. The proposed method combines the robustness of the rank regression and the oracle property of the SCAD penalty. We develop an efficient algorithm to compute the proposed estimator that includes a SCAD estimate based on the local linear approximation and the tuning parameter of the penalty function. Our estimate can be obtained by the least absolute deviation method. We used an optimal tuning parameter based on the Bayesian information criterion and the cross validation method. Numerical simulation shows that the proposed estimator is robust and effective to analyze contaminated data.

Comparison of model selection criteria in graphical LASSO (그래프 LASSO에서 모형선택기준의 비교)

  • Ahn, Hyeongseok;Park, Changyi
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.4
    • /
    • pp.881-891
    • /
    • 2014
  • Graphical models can be used as an intuitive tool for modeling a complex stochastic system with a large number of variables related each other because the conditional independence between random variables can be visualized as a network. Graphical least absolute shrinkage and selection operator (LASSO) is considered to be effective in avoiding overfitting in the estimation of Gaussian graphical models for high dimensional data. In this paper, we consider the model selection problem in graphical LASSO. Particularly, we compare various model selection criteria via simulations and analyze a real financial data set.

Penalized variable selection for accelerated failure time models

  • Park, Eunyoung;Ha, Il Do
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.6
    • /
    • pp.591-604
    • /
    • 2018
  • The accelerated failure time (AFT) model is a linear model under the log-transformation of survival time that has been introduced as a useful alternative to the proportional hazards (PH) model. In this paper we propose variable-selection procedures of fixed effects in a parametric AFT model using penalized likelihood approaches. We use three popular penalty functions, least absolute shrinkage and selection operator (LASSO), adaptive LASSO and smoothly clipped absolute deviation (SCAD). With these procedures we can select important variables and estimate the fixed effects at the same time. The performance of the proposed method is evaluated using simulation studies, including the investigation of impact of misspecifying the assumed distribution. The proposed method is illustrated with a primary biliary cirrhosis (PBC) data set.

Evaluating Variable Selection Techniques for Multivariate Linear Regression (다중선형회귀모형에서의 변수선택기법 평가)

  • Ryu, Nahyeon;Kim, Hyungseok;Kang, Pilsung
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.42 no.5
    • /
    • pp.314-326
    • /
    • 2016
  • The purpose of variable selection techniques is to select a subset of relevant variables for a particular learning algorithm in order to improve the accuracy of prediction model and improve the efficiency of the model. We conduct an empirical analysis to evaluate and compare seven well-known variable selection techniques for multiple linear regression model, which is one of the most commonly used regression model in practice. The variable selection techniques we apply are forward selection, backward elimination, stepwise selection, genetic algorithm (GA), ridge regression, lasso (Least Absolute Shrinkage and Selection Operator) and elastic net. Based on the experiment with 49 regression data sets, it is found that GA resulted in the lowest error rates while lasso most significantly reduces the number of variables. In terms of computational efficiency, forward/backward elimination and lasso requires less time than the other techniques.

Concave penalized linear discriminant analysis on high dimensions

  • Sunghoon Kwon;Hyebin Kim;Dongha Kim;Sangin Lee
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.4
    • /
    • pp.393-408
    • /
    • 2024
  • The sparse linear discriminant analysis can be incorporated into the penalized linear regression framework, but most studies have been limited to specific convex penalties, including the least absolute selection and shrinkage operator and its variants. Within this framework, concave penalties can serve as natural counterparts of the convex penalties. Implementing the concave penalized direction vector of discrimination appears to be straightforward, but developing its theoretical properties remains challenging. In this paper, we explore a class of concave penalties that covers the smoothly clipped absolute deviation and minimax concave penalties as examples. We prove that employing concave penalties guarantees an oracle property uniformly within this penalty class, even for high-dimensional samples. Here, the oracle property implies that an ideal direction vector of discrimination can be exactly recovered through concave penalized least squares estimation. Numerical studies confirm that the theoretical results hold with finite samples.

Influence of Two-Dimensional and Three-Dimensional Acquisitions of Radiomic Features for Prediction Accuracy

  • Ryohei Fukui;Ryutarou Matsuura;Katsuhiro Kida;Sachiko Goto
    • Progress in Medical Physics
    • /
    • v.34 no.3
    • /
    • pp.23-32
    • /
    • 2023
  • Purpose: In radiomics analysis, to evaluate features, and predict genetic characteristics and survival time, the pixel values of lesions depicted in computed tomography (CT) and magnetic resonance imaging (MRI) images are used. CT and MRI offer three-dimensional images, thus producing three-dimensional features (Features_3d) as output. However, in reports, the superiority between Features_3d and two-dimensional features (Features_2d) is distinct. In this study, we aimed to investigate whether a difference exists in the prediction accuracy of radiomics analysis of lung cancer using Features_2d and Features_3d. Methods: A total of 38 cases of large cell carcinoma (LCC) and 40 cases of squamous cell carcinoma (SCC) were selected for this study. Two- and three-dimensional lesion segmentations were performed. A total of 774 features were obtained. Using least absolute shrinkage and selection operator regression, seven Features_2d and six Features_3d were obtained. Results: Linear discriminant analysis revealed that the sensitivities of Features_2d and Features_3d to LCC were 86.8% and 89.5%, respectively. The coefficients of determination through multiple regression analysis and the areas under the receiver operating characteristic curve (AUC) were 0.68 and 0.70 and 0.93 and 0.94, respectively. The P-value of the estimated AUC was 0.87. Conclusions: No difference was found in the prediction accuracy for LCC and SCC between Features_2d and Features_3d.

An Application of the Clustering Threshold Gradient Descent Regularization Method for Selecting Genes in Predicting the Survival Time of Lung Carcinomas

  • Lee, Seung-Yeoun;Kim, Young-Chul
    • Genomics & Informatics
    • /
    • v.5 no.3
    • /
    • pp.95-101
    • /
    • 2007
  • In this paper, we consider the variable selection methods in the Cox model when a large number of gene expression levels are involved with survival time. Deciding which genes are associated with survival time has been a challenging problem because of the large number of genes and relatively small sample size (n<