• Title/Summary/Keyword: Lasso

Search Result 169, Processing Time 0.028 seconds

Comparison of model selection criteria in graphical LASSO (그래프 LASSO에서 모형선택기준의 비교)

  • Ahn, Hyeongseok;Park, Changyi
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.4
    • /
    • pp.881-891
    • /
    • 2014
  • Graphical models can be used as an intuitive tool for modeling a complex stochastic system with a large number of variables related each other because the conditional independence between random variables can be visualized as a network. Graphical least absolute shrinkage and selection operator (LASSO) is considered to be effective in avoiding overfitting in the estimation of Gaussian graphical models for high dimensional data. In this paper, we consider the model selection problem in graphical LASSO. Particularly, we compare various model selection criteria via simulations and analyze a real financial data set.

Graphical method for evaluating the impact of influential observations in high-dimensional data (고차원 자료에서 영향점의 영향을 평가하기 위한 그래픽 방법)

  • Ahn, Sojin;Lee, Jae Eun;Jang, Dae-Heung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.6
    • /
    • pp.1291-1300
    • /
    • 2017
  • In the high-dimensional data, the number of variables is very larger than the number of observations. In this case, the impact of influential observations on regression coefficient estimates can be very large. Jang and Anderson-Cook (2017) suggested the LASSO influence plot. In this paper, we propose the LASSO influence plot, LASSO variable selection ranking plot, and three-dimensional LASSO influence plot as graphical methods for evaluating the impact of influential observations in high-dimensional data. With real two high-dimensional data examples, we apply these graphical methods as the regression diagnostics tools for finding influential observations. It has been found that we can obtain influential observations with by these graphical methods.

News Impact Curves of Volatility for Asymmetric GARCH via LASSO (LASSO를 이용한 비대칭 GARCH 모형의 변동성 커브)

  • Yoon, J.E.;Lee, J.W.;Hwang, S.Y.
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.1
    • /
    • pp.159-168
    • /
    • 2014
  • The news impact curve(NIC) originally proposed by Engle and Ng (1993) is a graphical representation of volatility for financial time series. The NIC is a simple but a powerful tool for identifying variability of a given time series. It is noted that the NIC is suited to symmetric volatility. Recently a lot of attention has been paid to asymmetric volatility models and therefore asymmetric version of the NIC would be useful in the field of financial time series. In this article, we propose to incorporate LASSO in constructing asymmetric NICs based on asymmetric GARCH models. In particular, bilinear GARCH models are considered and illustrated via KOSDAQ data.

Comparison of Laplace and Double Pareto Penalty: LASSO and Elastic Net (라플라스와 이중 파레토 벌점의 비교: LASSO와 Elastic Net)

  • Kyung, Minjung
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.6
    • /
    • pp.975-989
    • /
    • 2014
  • Lasso (Tibshirani, 1996) and Elastic Net (Zou and Hastie, 2005) have been widely used in various fields for simultaneous variable selection and coefficient estimation. Bayesian methods using a conditional Laplace and a double Pareto prior specification have been discussed in the form of hierarchical specification. Full conditional posterior distributions with each priors have been derived. We compare the performance of Bayesian lassos with Laplace prior and the performance with double Pareto prior using simulations. We also apply the proposed Bayesian hierarchical models to real data sets to predict the collapse of governments in Asia.

Drought forecasting over South Korea based on the teleconnected global climate variables

  • Taesam Lee;Yejin Kong;Sejeong Lee;Taegyun Kim
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.47-47
    • /
    • 2023
  • Drought occurs due to lack of water resources over an extended period and its intensity has been magnified globally by climate change. In recent years, drought over South Korea has also been intensed, and the prediction was inevitable for the water resource management and water industry. Therefore, drought forecasting over South Korea was performed in the current study with the following procedure. First, accumulated spring precipitation(ASP) driven by the 93 weather stations in South Korea was taken with their median. Then, correlation analysis was followed between ASP and Df4m, the differences of two pair of the global winter MSLP. The 37 Df4m variables with high correlations over 0.55 was chosen and sorted into three regions. The selected Df4m variables in the same region showed high similarity, leading the multicollinearity problem. To avoid this problem, a model that performs variable selection and model fitting at once, least absolute shrinkage and selection operator(LASSO) was applied. The LASSO model selected 5 variables which showed a good agreement of the predicted with the observed value, R2=0.72. Other models such as multiple linear regression model and ElasticNet were also performed, but did not present a performance as good as LASSO. Therefore, LASSO model can be an appropriate model to forecast spring drought over South Korea and can be used to mange water resources efficiently.

  • PDF

Evaluating Variable Selection Techniques for Multivariate Linear Regression (다중선형회귀모형에서의 변수선택기법 평가)

  • Ryu, Nahyeon;Kim, Hyungseok;Kang, Pilsung
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.42 no.5
    • /
    • pp.314-326
    • /
    • 2016
  • The purpose of variable selection techniques is to select a subset of relevant variables for a particular learning algorithm in order to improve the accuracy of prediction model and improve the efficiency of the model. We conduct an empirical analysis to evaluate and compare seven well-known variable selection techniques for multiple linear regression model, which is one of the most commonly used regression model in practice. The variable selection techniques we apply are forward selection, backward elimination, stepwise selection, genetic algorithm (GA), ridge regression, lasso (Least Absolute Shrinkage and Selection Operator) and elastic net. Based on the experiment with 49 regression data sets, it is found that GA resulted in the lowest error rates while lasso most significantly reduces the number of variables. In terms of computational efficiency, forward/backward elimination and lasso requires less time than the other techniques.

Variable Selection Via Penalized Regression

  • Yoon, Young-Joo;Song, Moon-Sup
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.3
    • /
    • pp.615-624
    • /
    • 2005
  • In this paper, we review the variable-selection properties of LASSO and SCAD in penalized regression. To improve the weakness of SCAD for high noise level, we propose a new penalty function called MSCAD which relaxes the unbiasedness condition of SCAD. In order to compare MSCAD with LASSO and SCAD, comparative studies are performed on simulated datasets and also on a real dataset. The performances of penalized regression methods are compared in terms of relative model error and the estimates of coefficients. The results of experiments show that the performance of MSCAD is between those of LASSO and SCAD as expected.

VARIABLE SELECTION VIA PENALIZED REGRESSION

  • Yoon, Young-Joo;Song, Moon-Sup
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2005.05a
    • /
    • pp.7-12
    • /
    • 2005
  • In this paper, we review the variable-selection properties of LASSO and SCAD in penalized regression. To improve the weakness of SCAD for high noise level, we propose a new penalty function called MSCAD which relaxes the unbiasedness condition of SCAD. In order to compare MSCAD with LASSO and SCAD, comparative studies are performed on simulated datasets and also on a real dataset. The performances of penalized regression methods are compared in terms of relative model error and the estimates of coefficients. The results of experiments show that the performance of MSCAD is between those of LASSO and SCAD as expected.

  • PDF

A Target Selection Model for the Counseling Services in Long-Term Care Insurance (노인장기요양보험 이용지원 상담 대상자 선정모형 개발)

  • Han, Eun-Jeong;Kim, Dong-Geon
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.6
    • /
    • pp.1063-1073
    • /
    • 2015
  • In the long-term care insurance (LTCI) system, National Health Insurance Service (NHIS) provide counseling services for beneficiaries and their family caregivers, which help them use LTC services appropriately. The purpose of this study was to develop a Target Selection Model for the Counseling Services based on needs of beneficiaries and their family caregivers. To develope models, we used data set of total 2,000 beneficiaries and family caregivers who have used the long-term care services in their home in March 2013 and completed questionnaires. The Target Selection Model was established through various data-mining models such as logistic regression, gradient boosting, Lasso, decision-tree model, Ensemble, and Neural network. Lasso model was selected as the final model because of the stability, high performance and availability. Our results might improve the satisfaction and the efficiency for the NHIS counseling services.

Lasso Regression of RNA-Seq Data based on Bootstrapping for Robust Feature Selection (안정적 유전자 특징 선택을 위한 유전자 발현량 데이터의 부트스트랩 기반 Lasso 회귀 분석)

  • Jo, Jeonghee;Yoon, Sungroh
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.9
    • /
    • pp.557-563
    • /
    • 2017
  • When large-scale gene expression data are analyzed using lasso regression, the estimation of regression coefficients may be unstable due to the highly correlated expression values between associated genes. This irregularity, in which the coefficients are reduced by L1 regularization, causes difficulty in variable selection. To address this problem, we propose a regression model which exploits the repetitive bootstrapping of gene expression values prior to lasso regression. The genes selected with high frequency were used to build each regression model. Our experimental results show that several genes were consistently selected in all regression models and we verified that these genes were not false positives. We also identified that the sign distribution of the regression coefficients of the selected genes from each model was correlated to the real dependent variables.