• Title/Summary/Keyword: 베이지안 샘플링 알고리즘

Search Result 16, Processing Time 0.019 seconds

Accelerating the EM Algorithm through Selective Sampling for Naive Bayes Text Classifier (나이브베이즈 문서분류시스템을 위한 선택적샘플링 기반 EM 가속 알고리즘)

  • Chang Jae-Young;Kim Han-Joon
    • The KIPS Transactions:PartD
    • /
    • v.13D no.3 s.106
    • /
    • pp.369-376
    • /
    • 2006
  • This paper presents a new method of significantly improving conventional Bayesian statistical text classifier by incorporating accelerated EM(Expectation Maximization) algorithm. EM algorithm experiences a slow convergence and performance degrade in its iterative process, especially when real online-textual documents do not follow EM's assumptions. In this study, we propose a new accelerated EM algorithm with uncertainty-based selective sampling, which is simple yet has a fast convergence speed and allow to estimate a more accurate classification model on Naive Bayesian text classifier. Experiments using the popular Reuters-21578 document collection showed that the proposed algorithm effectively improves classification accuracy.

A Bayesian Sampling Algorithm for Evolving Random Hypergraph Models Representing Higher-Order Correlations (고차상관관계를 표현하는 랜덤 하이퍼그래프 모델 진화를 위한 베이지안 샘플링 알고리즘)

  • Lee, Si-Eun;Lee, In-Hee;Zhang, Byoung-Tak
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.3
    • /
    • pp.208-216
    • /
    • 2009
  • A number of estimation of distribution algorithms have been proposed that do not use explicitly crossover and mutation of traditional genetic algorithms, but estimate the distribution of population for more efficient search. But because it is not easy to discover higher-order correlations of variables, lower-order correlations are estimated most cases under various constraints. In this paper, we propose a new estimation of distribution algorithm that represents higher-order correlations of the data and finds global optimum more efficiently. The proposed algorithm represents the higher-order correlations among variables by building random hypergraph model composed of hyperedges consisting of variables which are expected to be correlated, and generates the next population by Bayesian sampling algorithm Experimental results show that the proposed algorithm can find global optimum and outperforms the simple genetic algorithm and BOA(Bayesian Optimization Algorithm) on decomposable functions with deceptive building blocks.

MCMC를 이용한 비동질적 포아송과정에서 일반화 순서통계량 모형의 연구

  • 최기헌;김희철
    • Communications for Statistical Applications and Methods
    • /
    • v.4 no.3
    • /
    • pp.753-763
    • /
    • 1997
  • 컴퓨터의 발전에 따른 MCMC를 비동질적 포아송 과정에 이용하였다. 베이지안 추론에서 조건부 분포를 가지고 사후분포를 결정하는데 있어서의 계산 문제를 고려하였다. 특히 분포가 이중지수, 곰페르츠, 랄리, 감마, 그리고 검벨인 일반 순서통계량 모형에 대하여 깁스 샘플링과 메트로폴리스 알고리즘을 활용한 베이지안 계산과 모형선택을 제시하였다.

  • PDF

Bayesian Approach for Software Reliability Models (소프트웨어 신뢰모형에 대한 베이지안 접근)

  • Choi, Ki-Heon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.10 no.1
    • /
    • pp.119-133
    • /
    • 1999
  • A Markov Chain Monte Carlo method is developed to compute the software reliability model. We consider computation problem for determining of posterior distibution in Bayseian inference. Metropolis algorithms along with Gibbs sampling are proposed to preform the Bayesian inference of the Mixed model with record value statistics. For model determiniation, we explored the prequential conditional predictive ordinate criterion that selects the best model with the largest posterior likelihood among models using all possible subsets of the component intensity functions. To relax the monotonic intensity function assumptions. A numerical example with simulated data set is given.

  • PDF

Bayesian logit models with auxiliary mixture sampling for analyzing diabetes diagnosis data (보조 혼합 샘플링을 이용한 베이지안 로지스틱 회귀모형 : 당뇨병 자료에 적용 및 분류에서의 성능 비교)

  • Rhee, Eun Hee;Hwang, Beom Seuk
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.1
    • /
    • pp.131-146
    • /
    • 2022
  • Logit models are commonly used to predicting and classifying categorical response variables. Most Bayesian approaches to logit models are implemented based on the Metropolis-Hastings algorithm. However, the algorithm has disadvantages of slow convergence and difficulty in ensuring adequacy for the proposal distribution. Therefore, we use auxiliary mixture sampler proposed by Frühwirth-Schnatter and Frühwirth (2007) to estimate logit models. This method introduces two sequences of auxiliary latent variables to make logit models satisfy normality and linearity. As a result, the method leads that logit model can be easily implemented by Gibbs sampling. We applied the proposed method to diabetes data from the Community Health Survey (2020) of the Korea Disease Control and Prevention Agency and compared performance with Metropolis-Hastings algorithm. In addition, we showed that the logit model using auxiliary mixture sampling has a great classification performance comparable to that of the machine learning models.

On the Bayesian Statistical Inference (베이지안 통계 추론)

  • Lee, Ho-Suk
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2007.06c
    • /
    • pp.263-266
    • /
    • 2007
  • This paper discusses the Bayesian statistical inference. This paper discusses the Bayesian inference, MCMC (Markov Chain Monte Carlo) integration, MCMC method, Metropolis-Hastings algorithm, Gibbs sampling, Maximum likelihood estimation, Expectation Maximization algorithm, missing data processing, and BMA (Bayesian Model Averaging). The Bayesian statistical inference is used to process a large amount of data in the areas of biology, medicine, bioengineering, science and engineering, and general data analysis and processing, and provides the important method to draw the optimal inference result. Lastly, this paper discusses the method of principal component analysis. The PCA method is also used for data analysis and inference.

  • PDF

Bayesian Multiple Change-Point for Small Data (소량자료를 위한 베이지안 다중 변환점 모형)

  • Cheon, Soo-Young;Yu, Wenxing
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.2
    • /
    • pp.237-246
    • /
    • 2012
  • Bayesian methods have been recently used to identify multiple change-points. However, the studies for small data are limited. This paper suggests the Bayesian noncentral t distribution change-point model for small data, and applies the Metropolis-Hastings-within-Gibbs Sampling algorithm to the proposed model. Numerical results of simulation and real data show the performance of the new model in terms of the quality of the resulting estimation of the numbers and positions of change-points for small data.

Bayesian Inference for Mixture Failure Model of Rayleigh and Erlang Pattern (RAYLEIGH와 ERLANG 추세를 가진 혼합 고장모형에 대한 베이지안 추론에 관한 연구)

  • 김희철;이승주
    • The Korean Journal of Applied Statistics
    • /
    • v.13 no.2
    • /
    • pp.505-514
    • /
    • 2000
  • A Markov Chain Monte Carlo method with data augmentation is developed to compute the features of the posterior distribution. For each observed failure epoch, we introduced mixture failure model of Rayleigh and Erlang(2) pattern. This data augmentation approach facilitates specification of the transitional measure in the Markov Chain. Gibbs steps are proposed to perform the Bayesian inference of such models. For model determination, we explored sum of relative error criterion that selects the best model. A numerical example with simulated data set is given.

  • PDF

Bayesian Change Point Analysis for a Sequence of Normal Observations: Application to the Winter Average Temperature in Seoul (정규확률변수 관측치열에 대한 베이지안 변화점 분석 : 서울지역 겨울철 평균기온 자료에의 적용)

  • 김경숙;손영숙
    • The Korean Journal of Applied Statistics
    • /
    • v.17 no.2
    • /
    • pp.281-301
    • /
    • 2004
  • In this paper we consider the change point problem in a sequence of univariate normal observations. We want to know whether there is any change point or not. In case a change point exists, we will identify its change type. Namely, it can be a mean change, a variance change, or both the mean and variance change. The intrinsic Bayes factors of Berger and Pericchi (1996, 1998) are used to find the type of optimal change model. The Gibbs sampling including the Metropolis-Hastings algorithm is used to estimate all the parameters in the change model. These methods are checked via simulation and applied to the winter average temperature data in Seoul.

The Bayesian Approach of Software Optimal Release Time Based on Log Poisson Execution Time Model (포아송 실행시간 모형에 의존한 소프트웨어 최적방출시기에 대한 베이지안 접근 방법에 대한 연구)

  • Kim, Hee-Cheul;Shin, Hyun-Cheul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.7
    • /
    • pp.1-8
    • /
    • 2009
  • In this paper, make a study decision problem called an optimal release policies after testing a software system in development phase and transfer it to the user. The optimal software release policies which minimize a total average software cost of development and maintenance under the constraint of satisfying a software reliability requirement is generally accepted. The Bayesian parametric inference of model using log Poisson execution time employ tool of Markov chain(Gibbs sampling and Metropolis algorithm). In a numerical example by T1 data was illustrated. make out estimating software optimal release time from the maximum likelihood estimation and Bayesian parametric estimation.