• Title/Summary/Keyword: Variable selection

Search Result 882, Processing Time 0.023 seconds

Variable Selection Via Penalized Regression

  • Yoon, Young-Joo;Song, Moon-Sup
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.3
    • /
    • pp.615-624
    • /
    • 2005
  • In this paper, we review the variable-selection properties of LASSO and SCAD in penalized regression. To improve the weakness of SCAD for high noise level, we propose a new penalty function called MSCAD which relaxes the unbiasedness condition of SCAD. In order to compare MSCAD with LASSO and SCAD, comparative studies are performed on simulated datasets and also on a real dataset. The performances of penalized regression methods are compared in terms of relative model error and the estimates of coefficients. The results of experiments show that the performance of MSCAD is between those of LASSO and SCAD as expected.

Variable Selection in Linear Random Effects Models for Normal Data

  • Kim, Hea-Jung
    • Journal of the Korean Statistical Society
    • /
    • v.27 no.4
    • /
    • pp.407-420
    • /
    • 1998
  • This paper is concerned with selecting covariates to be included in building linear random effects models designed to analyze clustered response normal data. It is based on a Bayesian approach, intended to propose and develop a procedure that uses probabilistic considerations for selecting premising subsets of covariates. The approach reformulates the linear random effects model in a hierarchical normal and point mass mixture model by introducing a set of latent variables that will be used to identify subset choices. The hierarchical model is flexible to easily accommodate sign constraints in the number of regression coefficients. Utilizing Gibbs sampler, the appropriate posterior probability of each subset of covariates is obtained. Thus, In this procedure, the most promising subset of covariates can be identified as that with highest posterior probability. The procedure is illustrated through a simulation study.

  • PDF

Optimal Parameter Selection of Power System Stabilizer using Genetic Algorithm (유전 알고리즘을 이용한 전력시스템 안정화 장치의 최적 파라미터 선정)

  • Chung, Hyeng-Hwan;Wang, Yong-Peel;Chung, Dong-Il;Chung, Mun-Kyu
    • The Transactions of the Korean Institute of Electrical Engineers A
    • /
    • v.48 no.6
    • /
    • pp.683-691
    • /
    • 1999
  • In this paper, it is suggested that the selection method of optimal parameter of power system stabilizer(PSS) with robustness in low frequency oscillation for power system using Real Variable Elitism Genetc Algorithm(RVEGA). The optimal parameters were selected in the case of power system stabilizer with one lead compensator, and two lead compensator. Also, the frequency responses characteristic of PSS, the system eigenvalues criterion and the dynamic characteristic were considered in the normal load and the heavy load, which proved usefulness of RVEGA compare with Yu's compensator design theory.

  • PDF

VARIABLE SELECTION VIA PENALIZED REGRESSION

  • Yoon, Young-Joo;Song, Moon-Sup
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2005.05a
    • /
    • pp.7-12
    • /
    • 2005
  • In this paper, we review the variable-selection properties of LASSO and SCAD in penalized regression. To improve the weakness of SCAD for high noise level, we propose a new penalty function called MSCAD which relaxes the unbiasedness condition of SCAD. In order to compare MSCAD with LASSO and SCAD, comparative studies are performed on simulated datasets and also on a real dataset. The performances of penalized regression methods are compared in terms of relative model error and the estimates of coefficients. The results of experiments show that the performance of MSCAD is between those of LASSO and SCAD as expected.

  • PDF

Tree-structured Clustering for Continuous Data (연속형 자료에 대한 나무형 군집화)

  • Huh Myung-Hoe;Yang Kyung-Sook
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.3
    • /
    • pp.661-671
    • /
    • 2005
  • The aim of this study is to propose a clustering method, called tree-structured clustering, by recursively partitioning continuous multivariate dat a based on overall $R^2$ criterion with a practical node-splitting decision rule. The clustering method produces easily interpretable clustering rules of tree types with the variable selection function. In numerical examples (Fisher's iris data and a Telecom case), we note several differences between tree-structured clustering and K-means clustering.

Variable Selection Theorem for the Analysis of Covariance Model (공분산분석 모형에서의 변수선택 정리)

  • Yoon, Sang-Hoo;Park, Jeong-Soo
    • Communications for Statistical Applications and Methods
    • /
    • v.15 no.3
    • /
    • pp.333-342
    • /
    • 2008
  • Variable selection theorem in the linear regression model is extended to the analysis of covariance model. When some of regression variables are omitted from the model, it reduces the variance of the estimators but introduces bias. Thus an appropriate balance between a biased model and one with large variances is recommended.

A method of selecting an active factor and its robustness against correlation in the data

  • Yamada, Shu;Harashima, Jun
    • International Journal of Quality Innovation
    • /
    • v.4 no.2
    • /
    • pp.16-31
    • /
    • 2003
  • A reducing variation of quality characteristics is a typical example of quality improvement. In such a case, we treat the quality characteristic, as a response variable and need to find active factors affecting the response from many candidate factors since reducing the variation of the response will be achieved by reducing variation of the active factors. In this paper, we first derive a method of selecting an active factor by linear regression. It is well known that correlation between factors deteriorates the precision of estimators. We, therefore, examine robustness of the selecting method against the correlation in the data set and derive an evaluation method of the deterioration brought by the correlation. Furthermore, some examples of selecting and evaluation methods are shown to demonstrate practical usage of the methods.

Penalized maximum likelihood estimation with symmetric log-concave errors and LASSO penalty

  • Seo-Young, Park;Sunyul, Kim;Byungtae, Seo
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.6
    • /
    • pp.641-653
    • /
    • 2022
  • Penalized least squares methods are important tools to simultaneously select variables and estimate parameters in linear regression. The penalized maximum likelihood can also be used for the same purpose assuming that the error distribution falls in a certain parametric family of distributions. However, the use of a certain parametric family can suffer a misspecification problem which undermines the estimation accuracy. To give sufficient flexibility to the error distribution, we propose to use the symmetric log-concave error distribution with LASSO penalty. A feasible algorithm to estimate both nonparametric and parametric components in the proposed model is provided. Some numerical studies are also presented showing that the proposed method produces more efficient estimators than some existing methods with similar variable selection performance.

Drought forecasting over South Korea based on the teleconnected global climate variables

  • Taesam Lee;Yejin Kong;Sejeong Lee;Taegyun Kim
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.47-47
    • /
    • 2023
  • Drought occurs due to lack of water resources over an extended period and its intensity has been magnified globally by climate change. In recent years, drought over South Korea has also been intensed, and the prediction was inevitable for the water resource management and water industry. Therefore, drought forecasting over South Korea was performed in the current study with the following procedure. First, accumulated spring precipitation(ASP) driven by the 93 weather stations in South Korea was taken with their median. Then, correlation analysis was followed between ASP and Df4m, the differences of two pair of the global winter MSLP. The 37 Df4m variables with high correlations over 0.55 was chosen and sorted into three regions. The selected Df4m variables in the same region showed high similarity, leading the multicollinearity problem. To avoid this problem, a model that performs variable selection and model fitting at once, least absolute shrinkage and selection operator(LASSO) was applied. The LASSO model selected 5 variables which showed a good agreement of the predicted with the observed value, R2=0.72. Other models such as multiple linear regression model and ElasticNet were also performed, but did not present a performance as good as LASSO. Therefore, LASSO model can be an appropriate model to forecast spring drought over South Korea and can be used to mange water resources efficiently.

  • PDF

Comparison of Feature Selection Methods in Support Vector Machines (지지벡터기계의 변수 선택방법 비교)

  • Kim, Kwangsu;Park, Changyi
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.1
    • /
    • pp.131-139
    • /
    • 2013
  • Support vector machines(SVM) may perform poorly in the presence of noise variables; in addition, it is difficult to identify the importance of each variable in the resulting classifier. A feature selection can improve the interpretability and the accuracy of SVM. Most existing studies concern feature selection in the linear SVM through penalty functions yielding sparse solutions. Note that one usually adopts nonlinear kernels for the accuracy of classification in practice. Hence feature selection is still desirable for nonlinear SVMs. In this paper, we compare the performances of nonlinear feature selection methods such as component selection and smoothing operator(COSSO) and kernel iterative feature extraction(KNIFE) on simulated and real data sets.