• 제목/요약/키워드: Bayesian model selection

검색결과 160건 처리시간 0.021초

Marginal Likelihoods for Bayesian Poisson Regression Models

  • Kim, Hyun-Joong;Balgobin Nandram;Kim, Seong-Jun;Choi, Il-Su;Ahn, Yun-Kee;Kim, Chul-Eung
    • Communications for Statistical Applications and Methods
    • /
    • 제11권2호
    • /
    • pp.381-397
    • /
    • 2004
  • The marginal likelihood has become an important tool for model selection in Bayesian analysis because it can be used to rank the models. We discuss the marginal likelihood for Poisson regression models that are potentially useful in small area estimation. Computation in these models is intensive and it requires an implementation of Markov chain Monte Carlo (MCMC) methods. Using importance sampling and multivariate density estimation, we demonstrate a computation of the marginal likelihood through an output analysis from an MCMC sampler.

환경음 인식을 위한 GMM의 혼합모델 개수 추정 (Estimation of Optimal Mixture Number of GMM for Environmental Sounds Recognition)

  • 한다정;박아론;백성준
    • 한국산학기술학회논문지
    • /
    • 제13권2호
    • /
    • pp.817-821
    • /
    • 2012
  • 본 논문에서는 환경음 인식에 GMM(Gaussain mixture model)을 이용할 때 MDL(minimum description length)와 BIC(Bayesian information criterion) 모델선택 기준을 이용하여 최적의 혼합모델 개수를 결정하는 방법에 대해 다루었다. 실험은 모두 9가지 종류의 환경음으로부터 12차 MFCC(mel-frequency cepstral coefficients) 특징 27747개를 추출하고 이를 GMM으로 분류하였다. 각 환경음 클래스의 최적 혼합모델 개수를 추정 하기위해 MDL과 BIC를 적용하고 그 결과를 고정 개수의 혼합모델을 사용한 경우와 비교하였다. 실험 결과에 따르면 혼합모델 선택 방법을 적용한 경우가 그렇지 않은 경우에 비해 거의 유사한 인식성능을 유지하면서 계산복잡도는 BIC와 MDL를 통해 각각 17.8%와 31.7%가 감소하는 것을 확인하였다. 이는 GMM을 이용한 환경음 인식에서 BIC와 MDL 적용을 통해 계산복잡도를 효과적으로 감소시킬 수 있음을 보여준다.

정규확률변수 관측치열에 대한 베이지안 변화점 분석 : 서울지역 겨울철 평균기온 자료에의 적용 (Bayesian Change Point Analysis for a Sequence of Normal Observations: Application to the Winter Average Temperature in Seoul)

  • 김경숙;손영숙
    • 응용통계연구
    • /
    • 제17권2호
    • /
    • pp.281-301
    • /
    • 2004
  • 본 논문에서는 일변량 정규분포를 따르는 확률변수의 관측치열에 대한 변화점 문제(change point problem)를 고찰한다. 변화점의 존재유무, 그리고 만일 변화점이 존재한다면 어떠한 유형으로 발생했는지 즉, 변화점 발생 이후로 평균만 변화, 분산만 변화, 또는 평균과 분산 모두가 변화했는지를 밝힌다. 가능한 여러 유형의 변화모형들 가운데 최적의 모형을 선택하기 위해 베이지안 모형선택 기법을 이용하고, 선택된 모형에 내재된 모수를 추정 하기 위해 메트로폴리스-혜스팅스 알고리 즘을 포함한 깁스샘플링 을 이용한다. 이러한 방법론은 모의실험을 통해 검토되고, 또한 서울지역의 겨울철 평균기온 자료에 적용된다.

Identifying differentially expressed genes using the Polya urn scheme

  • Saraiva, Erlandson Ferreira;Suzuki, Adriano Kamimura;Milan, Luis Aparecido
    • Communications for Statistical Applications and Methods
    • /
    • 제24권6호
    • /
    • pp.627-640
    • /
    • 2017
  • A common interest in gene expression data analysis is to identify genes that present significant changes in expression levels among biological experimental conditions. In this paper, we develop a Bayesian approach to make a gene-by-gene comparison in the case with a control and more than one treatment experimental condition. The proposed approach is within a Bayesian framework with a Dirichlet process prior. The comparison procedure is based on a model selection procedure developed using the discreteness of the Dirichlet process and its representation via Polya urn scheme. The posterior probabilities for models considered are calculated using a Gibbs sampling algorithm. A numerical simulation study is conducted to understand and compare the performance of the proposed method in relation to usual methods based on analysis of variance (ANOVA) followed by a Tukey test. The comparison among methods is made in terms of a true positive rate and false discovery rate. We find that proposed method outperforms the other methods based on ANOVA followed by a Tukey test. We also apply the methodologies to a publicly available data set on Plasmodium falciparum protein.

고차원 선형 및 로지스틱 회귀모형에 대한 변분 베이즈 방법 소개 (Introduction to variational Bayes for high-dimensional linear and logistic regression models)

  • 장인송;이경재
    • 응용통계연구
    • /
    • 제35권3호
    • /
    • pp.445-455
    • /
    • 2022
  • 본 논문에서는 고차원 희소 회귀분석을 위한 기존의 베이지안 방법들을 소개하고, 다양한 모의실험 세팅에서 성능을 비교한다. 특히, 확장 가능하고 정확한 베이지안 추론을 가능하게 하는 변분 베이즈 방법(variational Bayes method) (Ray와 Szabó, 2021) 에 중점을 둔다. 시뮬레이션 자료를 기반으로 한 희소 고차원 선형회귀분석을 실시하고 변분 베이즈 방법의 성능을 다른 베이지안 및 빈도론 방법들과 비교한다. 로지스틱 회귀분석에서 변분 베이즈 방법의 실제 성능을 확인하기 위해 백혈병 유전자 발현 자료를 사용하여 실자료 분석을 수행한다.

Genetic parameters for marbling and body score in Anglonubian goats using Bayesian inference via threshold and linear models

  • Figueiredo Filho, Luiz Antonio Silva;Sarmento, Jose Lindenberg Rocha;Campelo, Jose Elivalto Guimaraes;de Oliveira Almeida, Marcos Jacob;de Sousa, Antonio Junior;da Silva Santos, Natanael Pereira;da Silva Costa, Marcio;Torres, Tatiana Saraiva;Sena, Luciano Silva
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제31권9호
    • /
    • pp.1407-1414
    • /
    • 2018
  • Objective: The aim of this study was to estimate (co) variance components and genetic parameters for categorical carcass traits using Bayesian inference via mixed linear and threshold animal models in Anglonubian goats. Methods: Data were obtained from Anglonubian goats reared in the Brazilian Mid-North region. The traits in study were body condition score, marbling in the rib eye, ribeye area, fat thickness of the sternum, hip height, leg perimeter, and body weight. The numerator relationship matrix contained information from 793 animals. The single- and two-trait analyses were performed to estimate (co) variance components and genetic parameters via linear and threshold animal models. For estimation of genetic parameters, chains with 2 and 4 million cycles were tested. An 1,000,000-cycle initial burn-in was considered with values taken every 250 cycles, in a total of 4,000 samples. Convergence was monitored by Geweke criteria and Monte Carlo error chain. Results: Threshold model best fits categorical data since it is more efficient to detect genetic variability. In two-trait analysis the contribution of the increase in information and the correlations between traits contributed to increase the estimated values for (co) variance components and heritability, in comparison to single-trait analysis. Heritability estimates for the study traits were from low to moderate magnitude. Conclusion: Direct selection of the continuous distribution of traits such as thickness sternal fat and hip height allows obtaining the indirect selection for marbling of ribeye.

최심신적설량 빈도분석을 위한 임계값을 가지는 일반화된 혼합분포모형 개발 (Development of Snow Depth Frequency Analysis Model Based on A Generalized Mixture Distribution with Threshold)

  • 김호준;김장경;권현한
    • 한국방재안전학회논문집
    • /
    • 제13권4호
    • /
    • pp.25-36
    • /
    • 2020
  • 기후변화로 인해 다양한 자연재해의 발생빈도 및 강도가 증가하고 있으며, 이를 대비하기 위하여 행정안전부에서 가뭄과 대설까지 포함한 자연재해저감 종합계획을 발표하였다. 강설량은 기온과 지형적 요인의 영향을 크게 받는다. 산악지형이 많은 강원도는 강설량이 많아 큰 적설량이 관측되지만, 겨울철 평균 온도가 상대적으로 높은 남부지방은 적설량이 작다. 무강설과 결측으로 인해 관측값에 0이 포함된 경우가 존재한다. 자료에 포함된 0은 통계적으로 민감하게 작용하며, 최적 확률분포 선정과 매개변수 추정이 어려워지는 문제점이 발생한다. 본 연구에서는 창원, 통영, 진주 관측소의 최심신적설에 대해 혼합분포를 적용하여 0을 구분하였고, 0에 근사한 값을 나누는 기준인 임계값을 매개변수 𝛿로 가정함으로써 무적설 기준을 자동으로 모형에서 추정하도록 하였다. Bayesian기법 활용하여 혼합분포모형의 매개변수를 추정하였고, 산정된 빈도별 확률적설심의 불확실성을 정량화하였다. 대관령 지점과 비교한 결과, 본 연구의 혼합분포모형은 적설량이 적은 지점에 대해 적용성이 우수한 것으로 평가되었다.

Evaluation of Related Risk Factors in Number of Musculoskeletal Disorders Among Carpet Weavers in Iran

  • Karimi, Nasim;Moghimbeigi, Abbas;Motamedzade, Majid;Roshanaei, Ghodratollah
    • Safety and Health at Work
    • /
    • 제7권4호
    • /
    • pp.322-325
    • /
    • 2016
  • Background: Musculoskeletal disorders (MSDs) are a common problem among carpet weavers. This study was undertaken to introduce affecting personal and occupational factors in developing the number of MSDs among carpet weavers. Methods: A cross-sectional study was performed among 862 weavers in seven towns with regard to workhouse location in urban or rural regions. Data were collected by using questionnaires that contain personal, workplace, and information tools and the modified Nordic MSDs questionnaire. Statistical analysis was performed by applying Poisson and negative binomial mixed models using a full Bayesian hierarchical approach. The deviance information criterion was used for comparison between models and model selection. Results: The majority of weavers (72%) were female and carpet weaving was the main job of 85.2% of workers. The negative binomial mixed model with lowest deviance information criterion was selected as the best model. The criteria showed the convergence of chains. Based on 95% Bayesian credible interval, the main job and weaving type variables statistically affected the number of MSDs, but variables age, sex, weaving comb, work experience, and carpet weaving looms were not significant. Conclusion: According to the results of this study, it can be concluded that occupational factors are associated with the number of MSDs developing among carpet weavers. Thus, using standard tools and decreasing hours of work per day can reduce frequency of MSDs among carpet weavers.

Application of the Weibull-Poisson long-term survival model

  • Vigas, Valdemiro Piedade;Mazucheli, Josmar;Louzada, Francisco
    • Communications for Statistical Applications and Methods
    • /
    • 제24권4호
    • /
    • pp.325-337
    • /
    • 2017
  • In this paper, we proposed a new long-term lifetime distribution with four parameters inserted in a risk competitive scenario with decreasing, increasing and unimodal hazard rate functions, namely the Weibull-Poisson long-term distribution. This new distribution arises from a scenario of competitive latent risk, in which the lifetime associated to the particular risk is not observable, and where only the minimum lifetime value among all risks is noticed in a long-term context. However, it can also be used in any other situation as long as it fits the data well. The Weibull-Poisson long-term distribution is presented as a particular case for the new exponential-Poisson long-term distribution and Weibull long-term distribution. The properties of the proposed distribution were discussed, including its probability density, survival and hazard functions and explicit algebraic formulas for its order statistics. Assuming censored data, we considered the maximum likelihood approach for parameter estimation. For different parameter settings, sample sizes, and censoring percentages various simulation studies were performed to study the mean square error of the maximum likelihood estimative, and compare the performance of the model proposed with the particular cases. The selection criteria Akaike information criterion, Bayesian information criterion, and likelihood ratio test were used for the model selection. The relevance of the approach was illustrated on two real datasets of where the new model was compared with its particular cases observing its potential and competitiveness.

Statistical analysis of KNHANES data with measurement error models

  • Hwang, Jinseub
    • Journal of the Korean Data and Information Science Society
    • /
    • 제26권3호
    • /
    • pp.773-779
    • /
    • 2015
  • We study a statistical analysis about the fifth wave data of the Korea National Health and Nutrition Examination Survey based on linear regression models with measurement errors. The data is obtained from a national population-based complex survey. To demonstrate the availability of measurement error models, two results between the general linear regression model and measurement error model are compared based on the model selection criteria which are Akaike information criterion and Bayesian information criterion. For our study, we use the simulation extrapolation algorithm for measurement error model and the jackknife method for the estimation of standard errors.