• Title/Summary/Keyword: Bayesian variable selection

Search Result 46, Processing Time 0.029 seconds

Introduction to variational Bayes for high-dimensional linear and logistic regression models (고차원 선형 및 로지스틱 회귀모형에 대한 변분 베이즈 방법 소개)

  • Jang, Insong;Lee, Kyoungjae
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.3
    • /
    • pp.445-455
    • /
    • 2022
  • In this paper, we introduce existing Bayesian methods for high-dimensional sparse regression models and compare their performance in various simulation scenarios. Especially, we focus on the variational Bayes approach proposed by Ray and Szabó (2021), which enables scalable and accurate Bayesian inference. Based on simulated data sets from sparse high-dimensional linear regression models, we compare the variational Bayes approach with other Bayesian and frequentist methods. To check the practical performance of the variational Bayes in logistic regression models, a real data analysis is conducted using leukemia data set.

Grid-based Gaussian process models for longitudinal genetic data

  • Chung, Wonil
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.1
    • /
    • pp.65-83
    • /
    • 2022
  • Although various statistical methods have been developed to map time-dependent genetic factors, most identified genetic variants can explain only a small portion of the estimated genetic variation in longitudinal traits. Gene-gene and gene-time/environment interactions are known to be important putative sources of the missing heritability. However, mapping epistatic gene-gene interactions is extremely difficult due to the very large parameter spaces for models containing such interactions. In this paper, we develop a Gaussian process (GP) based nonparametric Bayesian variable selection method for longitudinal data. It maps multiple genetic markers without restricting to pairwise interactions. Rather than modeling each main and interaction term explicitly, the GP model measures the importance of each marker, regardless of whether it is mostly due to a main effect or some interaction effect(s), via an unspecified function. To improve the flexibility of the GP model, we propose a novel grid-based method for the within-subject dependence structure. The proposed method can accurately approximate complex covariance structures. The dimension of the covariance matrix depends only on the number of fixed grid points although each subject may have different numbers of measurements at different time points. The deviance information criterion (DIC) and the Bayesian predictive information criterion (BPIC) are proposed for selecting an optimal number of grid points. To efficiently draw posterior samples, we combine a hybrid Monte Carlo method with a partially collapsed Gibbs (PCG) sampler. We apply the proposed GP model to a mouse dataset on age-related body weight.

Bayesian Inference for the Zero In ated Negative Binomial Regression Model (제로팽창 음이항 회귀모형에 대한 베이지안 추론)

  • Shim, Jung-Suk;Lee, Dong-Hee;Jun, Byoung-Cheol
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.5
    • /
    • pp.951-961
    • /
    • 2011
  • In this paper, we propose a Bayesian inference using the Markov Chain Monte Carlo(MCMC) method for the zero inflated negative binomial(ZINB) regression model. The proposed model allows the regression model for zero inflation probability as well as the regression model for the mean of the dependent variable. This extends the work of Jang et al. (2010) to the fully defiend ZINB regression model. In addition, we apply the proposed method to a real data example, and compare the efficiency with the zero inflated Poisson model using the DIC. Since the DIC of the ZINB is smaller than that of the ZIP, the ZINB model shows superior performance over the ZIP model in zero inflated count data with overdispersion.

KCYP data analysis using Bayesian multivariate linear model (베이지안 다변량 선형 모형을 이용한 청소년 패널 데이터 분석)

  • Insun, Lee;Keunbaik, Lee
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.6
    • /
    • pp.703-724
    • /
    • 2022
  • Although longitudinal studies mainly produce multivariate longitudinal data, most of existing statistical models analyze univariate longitudinal data and there is a limitation to explain complex correlations properly. Therefore, this paper describes various methods of modeling the covariance matrix to explain the complex correlations. Among them, modified Cholesky decomposition, modified Cholesky block decomposition, and hypersphere decomposition are reviewed. In this paper, we review these methods and analyze Korean children and youth panel (KCYP) data are analyzed using the Bayesian method. The KCYP data are multivariate longitudinal data that have response variables: School adaptation, academic achievement, and dependence on mobile phones. Assuming that the correlation structure and the innovation standard deviation structure are different, several models are compared. For the most suitable model, all explanatory variables are significant for school adaptation, and academic achievement and only household income appears as insignificant variables when cell phone dependence is a response variable.

Pliable regression spline estimator using auxiliary variables

  • Oh, Jae-Kwon;Jhong, Jae-Hwan
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.5
    • /
    • pp.537-551
    • /
    • 2021
  • We conducted a study on a regression spline estimator with a few pre-specified auxiliary variables. For the implementation of the proposed estimators, we adapted a coordinate descent algorithm. This was implemented by considering a structure of the sum of the residuals squared objective function determined by the B-spline and the auxiliary coefficients. We also considered an efficient stepwise knot selection algorithm based on the Bayesian information criterion. This was to adaptively select smoothly functioning estimator data. Numerical studies using both simulated and real data sets were conducted to illustrate the proposed method's performance. An R software package psav is available.

Model selection algorithm in Gaussian process regression for computer experiments

  • Lee, Youngsaeng;Park, Jeong-Soo
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.4
    • /
    • pp.383-396
    • /
    • 2017
  • The model in our approach assumes that computer responses are a realization of a Gaussian processes superimposed on a regression model called a Gaussian process regression model (GPRM). Selecting a subset of variables or building a good reduced model in classical regression is an important process to identify variables influential to responses and for further analysis such as prediction or classification. One reason to select some variables in the prediction aspect is to prevent the over-fitting or under-fitting to data. The same reasoning and approach can be applicable to GPRM. However, only a few works on the variable selection in GPRM were done. In this paper, we propose a new algorithm to build a good prediction model among some GPRMs. It is a post-work of the algorithm that includes the Welch method suggested by previous researchers. The proposed algorithms select some non-zero regression coefficients (${\beta}^{\prime}s$) using forward and backward methods along with the Lasso guided approach. During this process, the fixed were covariance parameters (${\theta}^{\prime}s$) that were pre-selected by the Welch algorithm. We illustrated the superiority of our proposed models over the Welch method and non-selection models using four test functions and one real data example. Future extensions are also discussed.

A comparison study of Bayesian high-dimensional linear regression models (베이지안 고차원 선형 회귀분석에서의 비교연구)

  • Shin, Ju-Won;Lee, Kyoungjae
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.3
    • /
    • pp.491-505
    • /
    • 2021
  • We consider linear regression models in high-dimensional settings (p ≫ n) and compare various classes of priors. The spike and slab prior is one of the most widely used priors for Bayesian regression models, but its model space is vast, resulting in a bad performance in finite samples. As an alternative, various continuous shrinkage priors, including the horseshoe prior and its variants, have been proposed. Although each of the above priors has been investigated separately, exhaustive comparative studies of their performance have been conducted very rarely. In this study, we compare the spike and slab prior, the horseshoe prior and its variants in various simulation settings. The performance of each method is demonstrated in terms of the regression coefficient estimation and variable selection. Finally, some remarks and suggestions are given based on comprehensive simulation studies.

Nearest-neighbor Rule based Prototype Selection Method and Performance Evaluation using Bias-Variance Analysis (최근접 이웃 규칙 기반 프로토타입 선택과 편의-분산을 이용한 성능 평가)

  • Shim, Se-Yong;Hwang, Doo-Sung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.10
    • /
    • pp.73-81
    • /
    • 2015
  • The paper proposes a prototype selection method and evaluates the generalization performance of standard algorithms and prototype based classification learning. The proposed prototype classifier defines multidimensional spheres with variable radii within class areas and generates a small set of training data. The nearest-neighbor classifier uses the new training set for predicting the class of test data. By decomposing bias and variance of the mean expected error value, we compare the generalization errors of k-nearest neighbor, Bayesian classifier, prototype selection using fixed radius and the proposed prototype selection method. In experiments, the bias-variance changing trends of the proposed prototype classifier are similar to those of nearest neighbor classifiers with all training data and the prototype selection rates are under 27.0% on average.

Bayesian Variable Selection in the Proportional Hazard Model with Application to DNA Microarray Data

  • Lee, Kyeon-Eun;Mallick, Bani K.
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.357-360
    • /
    • 2005
  • In this paper we consider the well-known semiparametric proportional hazards (PH) models for survival analysis. These models are usually used with few covariates and many observations (subjects). But, for a typical setting of gene expression data from DNA microarray, we need to consider the case where the number of covariates p exceeds the number of samples n. For a given vector of response values which are times to event (death or censored times) and p gene expressions (covariates), we address the issue of how to reduce the dimension by selecting the significant genes. This approach enable us to estimate the survival curve when n < < p. In our approach, rather than fixing the number of selected genes, we will assign a prior distribution to this number. The approach creates additional flexibility by allowing the imposition of constraints, such as bounding the dimension via a prior, which in effect works as a penalty. To implement our methodology, we use a Markov Chain Monte Carlo (MCMC) method. We demonstrate the use of the methodology to diffuse large B-cell lymphoma (DLBCL) complementary DNA(cDNA) data.

  • PDF

Evolution Strategies Based Particle Filters for Simultaneous State and Parameter Estimation of Nonlinear Stochastic Models

  • Uosaki, K.;Hatanaka, T.
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2005.06a
    • /
    • pp.1765-1770
    • /
    • 2005
  • Recently, particle filters have attracted attentions for nonlinear state estimation. In this approaches, a posterior probability distribution of the state variable is evaluated based on observations in simulation using so-called importance sampling. We proposed a new filter, Evolution Strategies based particle (ESP) filter to circumvent degeneracy phenomena in the importance weights, which deteriorates the filter performance, and apply it to simultaneous state and parameter estimation of nonlinear state space models. Results of numerical simulation studies illustrate the applicability of this approach.

  • PDF