• Title/Summary/Keyword: Lasso 모형

Search Result 52, Processing Time 0.022 seconds

Comparison of Laplace and Double Pareto Penalty: LASSO and Elastic Net (라플라스와 이중 파레토 벌점의 비교: LASSO와 Elastic Net)

  • Kyung, Minjung
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.6
    • /
    • pp.975-989
    • /
    • 2014
  • Lasso (Tibshirani, 1996) and Elastic Net (Zou and Hastie, 2005) have been widely used in various fields for simultaneous variable selection and coefficient estimation. Bayesian methods using a conditional Laplace and a double Pareto prior specification have been discussed in the form of hierarchical specification. Full conditional posterior distributions with each priors have been derived. We compare the performance of Bayesian lassos with Laplace prior and the performance with double Pareto prior using simulations. We also apply the proposed Bayesian hierarchical models to real data sets to predict the collapse of governments in Asia.

News Impact Curves of Volatility for Asymmetric GARCH via LASSO (LASSO를 이용한 비대칭 GARCH 모형의 변동성 커브)

  • Yoon, J.E.;Lee, J.W.;Hwang, S.Y.
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.1
    • /
    • pp.159-168
    • /
    • 2014
  • The news impact curve(NIC) originally proposed by Engle and Ng (1993) is a graphical representation of volatility for financial time series. The NIC is a simple but a powerful tool for identifying variability of a given time series. It is noted that the NIC is suited to symmetric volatility. Recently a lot of attention has been paid to asymmetric volatility models and therefore asymmetric version of the NIC would be useful in the field of financial time series. In this article, we propose to incorporate LASSO in constructing asymmetric NICs based on asymmetric GARCH models. In particular, bilinear GARCH models are considered and illustrated via KOSDAQ data.

Robust estimation of sparse vector autoregressive models (희박 벡터 자기 회귀 모형의 로버스트 추정)

  • Kim, Dongyeong;Baek, Changryong
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.5
    • /
    • pp.631-644
    • /
    • 2022
  • This paper considers robust estimation of the sparse vector autoregressive model (sVAR) useful in high-dimensional time series analysis. First, we generalize the result of Xu et al. (2008) that the adaptive lasso indeed has robustness in sVAR as well. However, adaptive lasso method in sVAR performs poorly as the number and sizes of outliers increases. Therefore, we propose new robust estimation methods for sVAR based on least absolute deviation (LAD) and Huber estimation. Our simulation results show that our proposed methods provide more accurate estimation in turn showed better forecasting performance when outliers exist. In addition, we applied our proposed methods to power usage data and confirmed that there are unignorable outliers and robust estimation taking such outliers into account improves forecasting.

Machine Learning Prediction of Economic Effects of Busan's Strategic Industry through Ridge Regression and Lasso Regression (릿지 회귀와 라쏘 회귀 모형에 의한 부산 전략산업의 지역경제 효과에 대한 머신러닝 예측)

  • Yi, Chae-Deug
    • Journal of Korea Port Economic Association
    • /
    • v.37 no.1
    • /
    • pp.197-215
    • /
    • 2021
  • This paper analyzes the machine learning predictions of the economic effects of Busan's strategic industries on the employment and income using the Ridge Regression and Lasso Regression models with regulation terms. According to the Ridge estimation and Lasso estimation models of employment, the intelligence information service industry such as the service platform, contents, and smart finance industries and the global tourism industry such as MICE and specialized tourism are predicted to influence on the employment in order. However, the Ridge and Lasso regression model show that the future transportation machine industry does not significantly increase the employment and income since it is the primitive investment industry. The Ridge estimation models of the income show that the intelligence information service industry and global tourism industry are also predicted to influence on the income in order. According to the Lasso estimation models of income, four strategic industries such as the life care, smart maritime, the intelligence machine, and clean tech industry do not influence the income. Furthermore, the future transportation machine industry may influence the income negatively since it is the primitive investment industry. Thus, we have to select the appropriate economic objectives and priorities of industrial policies.

Lasso Regression of RNA-Seq Data based on Bootstrapping for Robust Feature Selection (안정적 유전자 특징 선택을 위한 유전자 발현량 데이터의 부트스트랩 기반 Lasso 회귀 분석)

  • Jo, Jeonghee;Yoon, Sungroh
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.9
    • /
    • pp.557-563
    • /
    • 2017
  • When large-scale gene expression data are analyzed using lasso regression, the estimation of regression coefficients may be unstable due to the highly correlated expression values between associated genes. This irregularity, in which the coefficients are reduced by L1 regularization, causes difficulty in variable selection. To address this problem, we propose a regression model which exploits the repetitive bootstrapping of gene expression values prior to lasso regression. The genes selected with high frequency were used to build each regression model. Our experimental results show that several genes were consistently selected in all regression models and we verified that these genes were not false positives. We also identified that the sign distribution of the regression coefficients of the selected genes from each model was correlated to the real dependent variables.

A Study on Domestic Drama Rating Prediction (국내 드라마 시청률 예측 및 영향요인 분석)

  • Kang, Suyeon;Jeon, Heejeong;Kim, Jihye;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.5
    • /
    • pp.933-949
    • /
    • 2015
  • Audience rating competition in the domestic drama market has increased recently due to the introduction of commercial broadcasting and diversification of channels. There is now a need for thorough studies and analysis on audience rating. Especially, a drama rating is an important measure to estimate advertisement costs for producers and advertisers. In this paper, we study the drama rating prediction models using various data mining techniques such as linear regression, LASSO regression, random forest, and gradient boosting. The analysis results show that initial drama ratings are affected by structural elements such as broadcasting station and broadcasting time. Average drama ratings are also influenced by earlier public opinion such as the number of internet searches about the drama.

Bayesian analysis of latent factor regression model (내재된 인자회귀모형의 베이지안 분석법)

  • Kyung, Minjung
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.4
    • /
    • pp.365-377
    • /
    • 2020
  • We discuss latent factor regression when constructing a common structure inherent among explanatory variables to solve multicollinearity and use them as regressors to construct a linear model of a response variable. Bayesian estimation with LASSO prior of a large penalty parameter to construct a significant factor loading matrix of intrinsic interests among infinite latent structures. The estimated factor loading matrix with estimated other parameters can be inversely transformed into linear parameters of each explanatory variable and used as prediction models for new observations. We apply the proposed method to Product Service Management data of HBAT and observe that the proposed method constructs the same factors of general common factor analysis for the fixed number of factors. The calculated MSE of predicted values of Bayesian latent factor regression model is also smaller than the common factor regression model.

Analysis of multi-center bladder cancer survival data using variable-selection method of multi-level frailty models (다수준 프레일티모형 변수선택법을 이용한 다기관 방광암 생존자료분석)

  • Kim, Bohyeon;Ha, Il Do;Lee, Donghwan
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.2
    • /
    • pp.499-510
    • /
    • 2016
  • It is very important to select relevant variables in regression models for survival analysis. In this paper, we introduce a penalized variable-selection procedure in multi-level frailty models based on the "frailtyHL" R package (Ha et al., 2012). Here, the estimation procedure of models is based on the penalized hierarchical likelihood, and three penalty functions (LASSO, SCAD and HL) are considered. The proposed methods are illustrated with multi-country/multi-center bladder cancer survival data from the EORTC in Belgium. We compare the results of three variable-selection methods and discuss their advantages and disadvantages. In particular, the results of data analysis showed that the SCAD and HL methods select well important variables than in the LASSO method.

Variable Selection in Frailty Models using FrailtyHL R Package: Breast Cancer Survival Data (frailtyHL 통계패키지를 이용한 프레일티 모형의 변수선택: 유방암 생존자료)

  • Kim, Bohyeon;Ha, Il Do;Noh, Maengseok;Na, Myung Hwan;Song, Ho-Chun;Kim, Jahae
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.5
    • /
    • pp.965-976
    • /
    • 2015
  • Determining relevant variables for a regression model is important in regression analysis. Recently, a variable selection methods using a penalized likelihood with various penalty functions (e.g. LASSO and SCAD) have been widely studied in simple statistical models such as linear models and generalized linear models. The advantage of these methods is that they select important variables and estimate regression coefficients, simultaneously; therefore, they delete insignificant variables by estimating their coefficients as zero. We study how to select proper variables based on penalized hierarchical likelihood (HL) in semi-parametric frailty models that allow three penalty functions, LASSO, SCAD and HL. For the variable selection we develop a new function in the "frailtyHL" R package. Our methods are illustrated with breast cancer survival data from the Medical Center at Chonnam National University in Korea. We compare the results from three variable-selection methods and discuss advantages and disadvantages.

Penalized variable selection in mean-variance accelerated failure time models (평균-분산 가속화 실패시간 모형에서 벌점화 변수선택)

  • Kwon, Ji Hoon;Ha, Il Do
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.3
    • /
    • pp.411-425
    • /
    • 2021
  • Accelerated failure time (AFT) model represents a linear relationship between the log-survival time and covariates. We are interested in the inference of covariate's effect affecting the variation of survival times in the AFT model. Thus, we need to model the variance as well as the mean of survival times. We call the resulting model mean and variance AFT (MV-AFT) model. In this paper, we propose a variable selection procedure of regression parameters of mean and variance in MV-AFT model using penalized likelihood function. For the variable selection, we study four penalty functions, i.e. least absolute shrinkage and selection operator (LASSO), adaptive lasso (ALASSO), smoothly clipped absolute deviation (SCAD) and hierarchical likelihood (HL). With this procedure we can select important covariates and estimate the regression parameters at the same time. The performance of the proposed method is evaluated using simulation studies. The proposed method is illustrated with a clinical example dataset.