• Title/Summary/Keyword: Deviance information criterion (DIC)

Search Result 10, Processing Time 0.037 seconds

Inferential Problems in Bayesian Logistic Regression Models (베이지안 로지스틱 회귀모형에서의 추론에 대한 연구)

  • Hwang, Jin-Soo;Kang, Sung-Chan
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.6
    • /
    • pp.1149-1160
    • /
    • 2011
  • Model selection and hypothesis testing problems in Bayesian inference are still debated between scholars. Bayesian factors traditionally used as a criterion in Bayesian hypothesis testing and model selection, are easy to understand but sometimes hard to compute. In addition, there are other model selection criterions such as DIC(Deviance Information Criterion) by Spiegelhalter et al. (2002) and Bayesian P-values for testing. In this paper, we briefly introduce the Bayesian hypothesis testing and model selection procedure. In addition we have applied a Bayesian inference to Swiss banknote data by a fitting logistic regression model and computing several test statistics to see if they provide consistent results.

Modeling pediatric tumor risks in Florida with conditional autoregressive structures and identifying hot-spots

  • Kim, Bit;Lim, Chae Young
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.5
    • /
    • pp.1225-1239
    • /
    • 2016
  • We investigate pediatric tumor incidence data collected by the Florida Association for Pediatric Tumor program using various models commonly used in disease mapping analysis. Particularly, we consider Poisson normal models with various conditional autoregressive structure for spatial dependence, a zero-in ated component to capture excess zero counts and a spatio-temporal model to capture spatial and temporal dependence, together. We found that intrinsic conditional autoregressive model provides the smallest Deviance Information Criterion (DIC) among the models when only spatial dependence is considered. On the other hand, adding an autoregressive structure over time decreases DIC over the model without time dependence component. We adopt weighted ranks squared error loss to identify high risk regions which provides similar results with other researchers who have worked on the same data set (e.g. Zhang et al., 2014; Wang and Rodriguez, 2014). Our results, thus, provide additional statistical support on those identied high risk regions discovered by the other researchers.

Sensitivity analysis in Bayesian nonignorable selection model for binary responses

  • Choi, Seong Mi;Kim, Dal Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.1
    • /
    • pp.187-194
    • /
    • 2014
  • We consider a Bayesian nonignorable selection model to accommodate the selection bias. Markov chain Monte Carlo methods is known to be very useful to fit the nonignorable selection model. However, sensitivity to prior assumptions on parameters for selection mechanism is a potential problem. To quantify the sensitivity to prior assumption, the deviance information criterion and the conditional predictive ordinate are used to compare the goodness-of-fit under two different prior specifications. It turns out that the 'MLE' prior gives better fit than the 'uniform' prior in viewpoints of goodness-of-fit measures.

Bayesian inference of longitudinal Markov binary regression models with t-link function (t-링크를 갖는 마코프 이항 회귀 모형을 이용한 인도네시아 어린이 종단 자료에 대한 베이지안 분석)

  • Sim, Bohyun;Chung, Younshik
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.1
    • /
    • pp.47-59
    • /
    • 2020
  • In this paper, we present the longitudinal Markov binary regression model with t-link function when its transition order is known or unknown. It is assumed that logit or probit models are considered in binary regression models. Here, t-link function can be used for more flexibility instead of the probit model since the t distribution approaches to normal distribution as the degree of freedom goes to infinity. A Markov regression model is considered because of the longitudinal data of each individual data set. We propose Bayesian method to determine the transition order of Markov regression model. In particular, we use the deviance information criterion (DIC) (Spiegelhalter et al., 2002) of possible models in order to determine the transition order of the Markov binary regression model if the transition order is known; however, we compute and compare their posterior probabilities if unknown. In order to overcome the complicated Bayesian computation, our proposed model is reconstructed by the ideas of Albert and Chib (1993), Kuo and Mallick (1998), and Erkanli et al. (2001). Our proposed method is applied to the simulated data and real data examined by Sommer et al. (1984). Markov chain Monte Carlo methods to determine the optimal model are used assuming that the transition order of the Markov regression model are known or unknown. Gelman and Rubin's method (1992) is also employed to check the convergence of the Metropolis Hastings algorithm.

Bayesian Analysis of Binary Non-homogeneous Markov Chain with Two Different Time Dependent Structures

  • Sung, Min-Je
    • Management Science and Financial Engineering
    • /
    • v.12 no.2
    • /
    • pp.19-35
    • /
    • 2006
  • We use the hierarchical Bayesian approach to describe the transition probabilities of a binary nonhomogeneous Markov chain. The Markov chain is used for describing the transition behavior of emotionally disturbed children in a treatment program. The effects of covariates on transition probabilities are assessed using a logit link function. To describe the time evolution of transition probabilities, we consider two modeling strategies. The first strategy is based on the concept of exchangeabiligy, whereas the second one is based on a first order Markov property. The deviance information criterion (DIC) measure is used to compare models with two different time dependent structures. The inferences are made using the Markov chain Monte Carlo technique. The developed methodology is applied to some real data.

Grid-based Gaussian process models for longitudinal genetic data

  • Chung, Wonil
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.1
    • /
    • pp.65-83
    • /
    • 2022
  • Although various statistical methods have been developed to map time-dependent genetic factors, most identified genetic variants can explain only a small portion of the estimated genetic variation in longitudinal traits. Gene-gene and gene-time/environment interactions are known to be important putative sources of the missing heritability. However, mapping epistatic gene-gene interactions is extremely difficult due to the very large parameter spaces for models containing such interactions. In this paper, we develop a Gaussian process (GP) based nonparametric Bayesian variable selection method for longitudinal data. It maps multiple genetic markers without restricting to pairwise interactions. Rather than modeling each main and interaction term explicitly, the GP model measures the importance of each marker, regardless of whether it is mostly due to a main effect or some interaction effect(s), via an unspecified function. To improve the flexibility of the GP model, we propose a novel grid-based method for the within-subject dependence structure. The proposed method can accurately approximate complex covariance structures. The dimension of the covariance matrix depends only on the number of fixed grid points although each subject may have different numbers of measurements at different time points. The deviance information criterion (DIC) and the Bayesian predictive information criterion (BPIC) are proposed for selecting an optimal number of grid points. To efficiently draw posterior samples, we combine a hybrid Monte Carlo method with a partially collapsed Gibbs (PCG) sampler. We apply the proposed GP model to a mouse dataset on age-related body weight.

Survival Analysis for White Non-Hispanic Female Breast Cancer Patients

  • Khan, Hafiz Mohammad Rafiqullah;Saxena, Anshul;Gabbidon, Kemesha;Stewart, Tiffanie Shauna-Jeanne;Bhatt, Chintan
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.15 no.9
    • /
    • pp.4049-4054
    • /
    • 2014
  • Background: Race and ethnicity are significant factors in predicting survival time of breast cancer patients. In this study, we applied advanced statistical methods to predict the survival of White non-Hispanic female breast cancer patients, who were diagnosed between the years 1973 and 2009 in the United States (U.S.). Materials and Methods: Demographic data from the Surveillance Epidemiology and End Results (SEER) database were used for the purpose of this study. Nine states were randomly selected from 12 U.S. cancer registries. A stratified random sampling method was used to select 2,000 female breast cancer patients from these nine states. We compared four types of advanced statistical probability models to identify the best-fit model for the White non-Hispanic female breast cancer survival data. Three model building criterion were used to measure and compare goodness of fit of the models. These include Akaike Information Criteria (AIC), Bayesian Information Criteria (BIC), and Deviance Information Criteria (DIC). In addition, we used a novel Bayesian method and the Markov Chain Monte Carlo technique to determine the posterior density function of the parameters. After evaluating the model parameters, we selected the model having the lowest DIC value. Using this Bayesian method, we derived the predictive survival density for future survival time and its related inferences. Results: The analytical sample of White non-Hispanic women included 2,000 breast cancer cases from the SEER database (1973-2009). The majority of cases were married (55.2%), the mean age of diagnosis was 63.61 years (SD = 14.24) and the mean survival time was 84 months (SD = 35.01). After comparing the four statistical models, results suggested that the exponentiated Weibull model (DIC= 19818.220) was a better fit for White non-Hispanic females' breast cancer survival data. This model predicted the survival times (in months) for White non-Hispanic women after implementation of precise estimates of the model parameters. Conclusions: By using modern model building criteria, we determined that the data best fit the exponentiated Weibull model. We incorporated precise estimates of the parameter into the predictive model and evaluated the survival inference for the White non-Hispanic female population. This method of analysis will assist researchers in making scientific and clinical conclusions when assessing survival time of breast cancer patients.

High Incidence of Breast Cancer in Light-Polluted Areas with Spatial Effects in Korea

  • Kim, Yun Jeong;Park, Man Sik;Lee, Eunil;Choi, Jae Wook
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.17 no.1
    • /
    • pp.361-367
    • /
    • 2016
  • We have reported a high prevalence of breast cancer in light-polluted areas in Korea. However, it is necessary to analyze the spatial effects of light polluted areas on breast cancer because light pollution levels are correlated with region proximity to central urbanized areas in studied cities. In this study, we applied a spatial regression method (an intrinsic conditional autoregressive [iCAR] model) to analyze the relationship between the incidence of breast cancer and artificial light at night (ALAN) levels in 25 regions including central city, urbanized, and rural areas. By Poisson regression analysis, there was a significant correlation between ALAN, alcohol consumption rates, and the incidence of breast cancer. We also found significant spatial effects between ALAN and the incidence of breast cancer, with an increase in the deviance information criterion (DIC) from 374.3 to 348.6 and an increase in $R^2$ from 0.574 to 0.667. Therefore, spatial analysis (an iCAR model) is more appropriate for assessing ALAN effects on breast cancer. To our knowledge, this study is the first to show spatial effects of light pollution on breast cancer, despite the limitations of an ecological study. We suggest that a decrease in ALAN could reduce breast cancer more than expected because of spatial effects.

A development of stochastic simulation model based on vector autoregressive model (VAR) for groundwater and river water stages (벡터자기회귀(VAR) 모형을 이용한 지하수위와 하천수위의 추계학적 모의기법 개발)

  • Kwon, Yoon Jeong;Won, Chang-Hee;Choi, Byoung-Han;Kwon, Hyun-Han
    • Journal of Korea Water Resources Association
    • /
    • v.55 no.12
    • /
    • pp.1137-1147
    • /
    • 2022
  • River and groundwater stages are the main elements in the hydrologic cycle. They are spatially correlated and can be used to evaluate hydrological and agricultural drought. Stochastic simulation is often performed independently on hydrological variables that are spatiotemporally correlated. In this setting, interdependency across mutual variables may not be maintained. This study proposes the Bayesian vector autoregression model (VAR) to capture the interdependency between multiple variables over time. VAR models systematically consider the lagged stages of each variable and the lagged values of the other variables. Further, an autoregressive model (AR) was built and compared with the VAR model. It was confirmed that the VAR model was more effective in reproducing observed interdependency (or cross-correlation) between river and ground stages, while the AR generally underestimated that of the observed.

A Study on Characteristics and Predictions of Seasonal Chlorophyll-a using Bayseian Regression in Paldang Watershed (베이지안 추정을 이용한 팔당호 유역의 계절별 클로로필a 예측 및 오염특성 연구)

  • Kim, Mi-Ah;Shin, Yuna;Kim, Kyunghyun;Heo, Tae-Young;Yoo, Moonkyu;Lee, Su-Woong
    • Journal of Korean Society on Water Environment
    • /
    • v.29 no.6
    • /
    • pp.832-841
    • /
    • 2013
  • In recent years, eutrophication in the Paldang Lake has become one of the major environmental problems in Korea as it may threaten drinking water safety and human health. Thus it is important to understand the phenomena and predict the time and magnitude of algal blooms for applying adequate algal reduction measures. This study performed seasonal water quality assessment and chlorophyll-a prediction using Bayseian simple/multiple linear regression analysis. Bayseian regression analysis could be a useful tool to overcome limitations of conventional regression analysis. Also it can consider uncertainty in prediction by using posterior distribution. Generally, chlorophyll-a of a P2(Paldang Dam 2) site showed high concentration in spring and it was similar to that of P4(Paldang Dam 4) site. For the development of Bayseian model, we performed seasonal correlation. As a result, chlorophyll-a of a P2 site had a high correlation with P5(Paldang Dam 5) site in spring (r = 0.786, p<0.05) and with P4 in winter (r = 0.843, p<0.05). Based on the DIC (Deviance Information Criterion) value, critical explanatory variables of the best fitting Bayesian linear regression model were selected as a $PO_4-P$ (P2), Chlorophyll-a (P5) in spring, $NH_3-N$ (P2), Chlorophyll-a (P4), $NH_3-N$ (P4) in summer, DTP (P2), outflow (P2), TP (P3), TP (P4) fall, COD (P2), Chl-a (P4) and COD (P4) in winter. The results of chlorophyll-a prediction showed relatively high $R^2$ and low RMSE values in summer and winter.