• Title/Summary/Keyword: Bayesian variable selection

Search Result 46, Processing Time 0.028 seconds

A Bayesian Variable Selection Method for Binary Response Probit Regression

  • Kim, Hea-Jung
    • Journal of the Korean Statistical Society
    • /
    • v.28 no.2
    • /
    • pp.167-182
    • /
    • 1999
  • This article is concerned with the selection of subsets of predictor variables to be included in building the binary response probit regression model. It is based on a Bayesian approach, intended to propose and develop a procedure that uses probabilistic considerations for selecting promising subsets. This procedure reformulates the probit regression setup in a hierarchical normal mixture model by introducing a set of hyperparameters that will be used to identify subset choices. The appropriate posterior probability of each subset of predictor variables is obtained through the Gibbs sampler, which samples indirectly from the multinomial posterior distribution on the set of possible subset choices. Thus, in this procedure, the most promising subset of predictors can be identified as the one with highest posterior probability. To highlight the merit of this procedure a couple of illustrative numerical examples are given.

  • PDF

Hierarchical Bayesian Inference of Binomial Data with Nonresponse

  • Han, Geunshik;Nandram, Balgobin
    • Journal of the Korean Statistical Society
    • /
    • v.31 no.1
    • /
    • pp.45-61
    • /
    • 2002
  • We consider the problem of estimating binomial proportions in the presence of nonignorable nonresponse using the Bayesian selection approach. Inference is sampling based and Markov chain Monte Carlo (MCMC) methods are used to perform the computations. We apply our method to study doctor visits data from the Korean National Family Income and Expenditure Survey (NFIES). The ignorable and nonignorable models are compared to Stasny's method (1991) by measuring the variability from the Metropolis-Hastings (MH) sampler. The results show that both models work very well.

Classification of High Dimensionality Data through Feature Selection Using Markov Blanket

  • Lee, Junghye;Jun, Chi-Hyuck
    • Industrial Engineering and Management Systems
    • /
    • v.14 no.2
    • /
    • pp.210-219
    • /
    • 2015
  • A classification task requires an exponentially growing amount of computation time and number of observations as the variable dimensionality increases. Thus, reducing the dimensionality of the data is essential when the number of observations is limited. Often, dimensionality reduction or feature selection leads to better classification performance than using the whole number of features. In this paper, we study the possibility of utilizing the Markov blanket discovery algorithm as a new feature selection method. The Markov blanket of a target variable is the minimal variable set for explaining the target variable on the basis of conditional independence of all the variables to be connected in a Bayesian network. We apply several Markov blanket discovery algorithms to some high-dimensional categorical and continuous data sets, and compare their classification performance with other feature selection methods using well-known classifiers.

A comparison study of Bayesian variable selection methods for sparse covariance matrices (희박 공분산 행렬에 대한 베이지안 변수 선택 방법론 비교 연구)

  • Kim, Bongsu;Lee, Kyoungjae
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.2
    • /
    • pp.285-298
    • /
    • 2022
  • Continuous shrinkage priors, as well as spike and slab priors, have been widely employed for Bayesian inference about sparse regression coefficient vectors or covariance matrices. Continuous shrinkage priors provide computational advantages over spike and slab priors since their model space is substantially smaller. This is especially true in high-dimensional settings. However, variable selection based on continuous shrinkage priors is not straightforward because they do not give exactly zero values. Although few variable selection approaches based on continuous shrinkage priors have been proposed, no substantial comparative investigations of their performance have been conducted. In this paper, We compare two variable selection methods: a credible interval method and the sequential 2-means algorithm (Li and Pati, 2017). Various simulation scenarios are used to demonstrate the practical performances of the methods. We conclude the paper by presenting some observations and conjectures based on the simulation findings.

A Hierarchical Bayesian Model for Survey Data with Nonresponse

  • Han, Geunshik
    • Journal of the Korean Statistical Society
    • /
    • v.30 no.3
    • /
    • pp.435-451
    • /
    • 2001
  • We describe a hierarchical bayesian model to analyze multinomial nonignorable nonresponse data. Using a Dirichlet and beta prior to model the cell probabilities, We develop a complete hierarchical bayesian analysis for multinomial proportions without making any algebraic approximation. Inference is sampling based and Markove chain Monte Carlo methods are used to perform the computations. We apply our method to the dta on body mass index(BMI) and show the model works reasonably well.

  • PDF

Penalized rank regression estimator with the smoothly clipped absolute deviation function

  • Park, Jong-Tae;Jung, Kang-Mo
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.6
    • /
    • pp.673-683
    • /
    • 2017
  • The least absolute shrinkage and selection operator (LASSO) has been a popular regression estimator with simultaneous variable selection. However, LASSO does not have the oracle property and its robust version is needed in the case of heavy-tailed errors or serious outliers. We propose a robust penalized regression estimator which provide a simultaneous variable selection and estimator. It is based on the rank regression and the non-convex penalty function, the smoothly clipped absolute deviation (SCAD) function which has the oracle property. The proposed method combines the robustness of the rank regression and the oracle property of the SCAD penalty. We develop an efficient algorithm to compute the proposed estimator that includes a SCAD estimate based on the local linear approximation and the tuning parameter of the penalty function. Our estimate can be obtained by the least absolute deviation method. We used an optimal tuning parameter based on the Bayesian information criterion and the cross validation method. Numerical simulation shows that the proposed estimator is robust and effective to analyze contaminated data.

Bayesian Variable Selection in Linear Regression Models with Inequality Constraints on the Coefficients (제한조건이 있는 선형회귀 모형에서의 베이지안 변수선택)

  • 오만숙
    • The Korean Journal of Applied Statistics
    • /
    • v.15 no.1
    • /
    • pp.73-84
    • /
    • 2002
  • Linear regression models with inequality constraints on the coefficients are frequently used in economic models due to sign or order constraints on the coefficients. In this paper, we propose a Bayesian approach to selecting significant explanatory variables in linear regression models with inequality constraints on the coefficients. Bayesian variable selection requires computation of posterior probability of each candidate model. We propose a method which computes all the necessary posterior model probabilities simultaneously. In specific, we obtain posterior samples form the most general model via Gibbs sampling algorithm (Gelfand and Smith, 1990) and compute the posterior probabilities by using the samples. A real example is given to illustrate the method.

Bayesian Typhoon Track Prediction Using Wind Vector Data

  • Han, Minkyu;Lee, Jaeyong
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.3
    • /
    • pp.241-253
    • /
    • 2015
  • In this paper we predict the track of typhoons using a Bayesian principal component regression model based on wind field data. Data is obtained at each time point and we applied the Bayesian principal component regression model to conduct the track prediction based on the time point. Based on regression model, we applied to variable selection prior and two kinds of prior distribution; normal and Laplace distribution. We show prediction results based on Bayesian Model Averaging (BMA) estimator and Median Probability Model (MPM) estimator. We analysis 8 typhoons in 2006 using data obtained from previous 6 years (2000-2005). We compare our prediction results with a moving-nest typhoon model (MTM) proposed by the Korea Meteorological Administration. We posit that is possible to predict the track of a typhoon accurately using only a statistical model and without a dynamical model.

Model selection via Bayesian information criterion for divide-and-conquer penalized quantile regression (베이즈 정보 기준을 활용한 분할-정복 벌점화 분위수 회귀)

  • Kang, Jongkyeong;Han, Seokwon;Bang, Sungwan
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.2
    • /
    • pp.217-227
    • /
    • 2022
  • Quantile regression is widely used in many fields based on the advantage of providing an efficient tool for examining complex information latent in variables. However, modern large-scale and high-dimensional data makes it very difficult to estimate the quantile regression model due to limitations in terms of computation time and storage space. Divide-and-conquer is a technique that divide the entire data into several sub-datasets that are easy to calculate and then reconstruct the estimates of the entire data using only the summary statistics in each sub-datasets. In this paper, we studied on a variable selection method using Bayes information criteria by applying the divide-and-conquer technique to the penalized quantile regression. When the number of sub-datasets is properly selected, the proposed method is efficient in terms of computational speed, providing consistent results in terms of variable selection as long as classical quantile regression estimates calculated with the entire data. The advantages of the proposed method were confirmed through simulation data and real data analysis.