• Title/Summary/Keyword: Markov Chain Monte Carlo algorithm

Search Result 70, Processing Time 0.02 seconds

Gas dynamics and star formation in dwarf galaxies: the case of DDO 210

  • Oh, Se-Heon;Zheng, Yun;Wang, Jing
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.44 no.2
    • /
    • pp.75.4-75.4
    • /
    • 2019
  • We present a quantitative analysis of the relationship between the gas dynamics and star formation history of DDO 210 which is an irregular dwarf galaxy in the local Universe. We perform profile analysis of an high-resolution neutral hydrogen (HI) data cube of the galaxy taken with the large Very Large Array (VLA) survey, LITTLE THINGS using newly developed algorithm based on a Bayesian Markov Chain Monte Carlo (MCMC) technique. The complex HI structure and kinematics of the galaxy are decomposed into multiple kinematic components in a quantitative way like 1) bulk motions which are most likely to follow the underlying circular rotation of the disk, 2) non-circular motions deviating from the bulk motions, and 3) kinematically cold and warm components with narrower and wider velocity dispersion. The decomposed kinematic components are then spatially correlated with the distribution of stellar populations obtained from the color-magnitude diagram (CMD) fitting method. The cold and warm gas components show negative and positive correlations between their velocity dispersions and the surface star formation rates of the populations with ages of < 40 Myr and 100~400 Myr, respectively. The cold gas is most likely to be associated with the young stellar populations. Then the stellar feedback of the young populations could influence the warm gas. The age difference between the populations which show the correlations indicates the time delay of the stellar feedback.

  • PDF

Bayesian logit models with auxiliary mixture sampling for analyzing diabetes diagnosis data (보조 혼합 샘플링을 이용한 베이지안 로지스틱 회귀모형 : 당뇨병 자료에 적용 및 분류에서의 성능 비교)

  • Rhee, Eun Hee;Hwang, Beom Seuk
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.1
    • /
    • pp.131-146
    • /
    • 2022
  • Logit models are commonly used to predicting and classifying categorical response variables. Most Bayesian approaches to logit models are implemented based on the Metropolis-Hastings algorithm. However, the algorithm has disadvantages of slow convergence and difficulty in ensuring adequacy for the proposal distribution. Therefore, we use auxiliary mixture sampler proposed by Frühwirth-Schnatter and Frühwirth (2007) to estimate logit models. This method introduces two sequences of auxiliary latent variables to make logit models satisfy normality and linearity. As a result, the method leads that logit model can be easily implemented by Gibbs sampling. We applied the proposed method to diabetes data from the Community Health Survey (2020) of the Korea Disease Control and Prevention Agency and compared performance with Metropolis-Hastings algorithm. In addition, we showed that the logit model using auxiliary mixture sampling has a great classification performance comparable to that of the machine learning models.

Non-Simultaneous Sampling Deactivation during the Parameter Approximation of a Topic Model

  • Jeong, Young-Seob;Jin, Sou-Young;Choi, Ho-Jin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.1
    • /
    • pp.81-98
    • /
    • 2013
  • Since Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA) were introduced, many revised or extended topic models have appeared. Due to the intractable likelihood of these models, training any topic model requires to use some approximation algorithm such as variational approximation, Laplace approximation, or Markov chain Monte Carlo (MCMC). Although these approximation algorithms perform well, training a topic model is still computationally expensive given the large amount of data it requires. In this paper, we propose a new method, called non-simultaneous sampling deactivation, for efficient approximation of parameters in a topic model. While each random variable is normally sampled or obtained by a single predefined burn-in period in the traditional approximation algorithms, our new method is based on the observation that the random variable nodes in one topic model have all different periods of convergence. During the iterative approximation process, the proposed method allows each random variable node to be terminated or deactivated when it is converged. Therefore, compared to the traditional approximation ways in which usually every node is deactivated concurrently, the proposed method achieves the inference efficiency in terms of time and memory. We do not propose a new approximation algorithm, but a new process applicable to the existing approximation algorithms. Through experiments, we show the time and memory efficiency of the method, and discuss about the tradeoff between the efficiency of the approximation process and the parameter consistency.

Survival Analysis of Gastric Cancer Patients with Incomplete Data

  • Moghimbeigi, Abbas;Tapak, Lily;Roshanaei, Ghodaratolla;Mahjub, Hossein
    • Journal of Gastric Cancer
    • /
    • v.14 no.4
    • /
    • pp.259-265
    • /
    • 2014
  • Purpose: Survival analysis of gastric cancer patients requires knowledge about factors that affect survival time. This paper attempted to analyze the survival of patients with incomplete registered data by using imputation methods. Materials and Methods: Three missing data imputation methods, including regression, expectation maximization algorithm, and multiple imputation (MI) using Monte Carlo Markov Chain methods, were applied to the data of cancer patients referred to the cancer institute at Imam Khomeini Hospital in Tehran in 2003 to 2008. The data included demographic variables, survival times, and censored variable of 471 patients with gastric cancer. After using imputation methods to account for missing covariate data, the data were analyzed using a Cox regression model and the results were compared. Results: The mean patient survival time after diagnosis was $49.1{\pm}4.4$ months. In the complete case analysis, which used information from 100 of the 471 patients, very wide and uninformative confidence intervals were obtained for the chemotherapy and surgery hazard ratios (HRs). However, after imputation, the maximum confidence interval widths for the chemotherapy and surgery HRs were 8.470 and 0.806, respectively. The minimum width corresponded with MI. Furthermore, the minimum Bayesian and Akaike information criteria values correlated with MI (-821.236 and -827.866, respectively). Conclusions: Missing value imputation increased the estimate precision and accuracy. In addition, MI yielded better results when compared with the expectation maximization algorithm and regression simple imputation methods.

Structural modal identification and MCMC-based model updating by a Bayesian approach

  • Zhang, F.L.;Yang, Y.P.;Ye, X.W.;Yang, J.H.;Han, B.K.
    • Smart Structures and Systems
    • /
    • v.24 no.5
    • /
    • pp.631-639
    • /
    • 2019
  • Finite element analysis is one of the important methods to study the structural performance. Due to the simplification, discretization and error of structural parameters, numerical model errors always exist. Besides, structural characteristics may also change because of material aging, structural damage, etc., making the initial finite element model cannot simulate the operational response of the structure accurately. Based on Bayesian methods, the initial model can be updated to obtain a more accurate numerical model. This paper presents the work on the field test, modal identification and model updating of a Chinese reinforced concrete pagoda. Based on the ambient vibration test, the acceleration response of the structure under operational environment was collected. The first six translational modes of the structure were identified by the enhanced frequency domain decomposition method. The initial finite element model of the pagoda was established, and the elastic modulus of columns, beams and slabs were selected as model parameters to be updated. Assuming the error between the measured mode and the calculated one follows a Gaussian distribution, the posterior probability density function (PDF) of the parameter to be updated is obtained and the uncertainty is quantitatively evaluated based on the Bayesian statistical theory and the Metropolis-Hastings algorithm, and then the optimal values of model parameters can be obtained. The results show that the difference between the calculated frequency of the finite element model and the measured one is reduced, and the modal correlation of the mode shape is improved. The updated numerical model can be used to evaluate the safety of the structure as a benchmark model for structural health monitoring (SHM).

A Bayesian zero-inflated Poisson regression model with random effects with application to smoking behavior (랜덤효과를 포함한 영과잉 포아송 회귀모형에 대한 베이지안 추론: 흡연 자료에의 적용)

  • Kim, Yeon Kyoung;Hwang, Beom Seuk
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.2
    • /
    • pp.287-301
    • /
    • 2018
  • It is common to encounter count data with excess zeros in various research fields such as the social sciences, natural sciences, medical science or engineering. Such count data have been explained mainly by zero-inflated Poisson model and extended models. Zero-inflated count data are also often correlated or clustered, in which random effects should be taken into account in the model. Frequentist approaches have been commonly used to fit such data. However, a Bayesian approach has advantages of prior information, avoidance of asymptotic approximations and practical estimation of the functions of parameters. We consider a Bayesian zero-inflated Poisson regression model with random effects for correlated zero-inflated count data. We conducted simulation studies to check the performance of the proposed model. We also applied the proposed model to smoking behavior data from the Regional Health Survey (2015) of the Korea Centers for disease control and prevention.

The NHPP Bayesian Software Reliability Model Using Latent Variables (잠재변수를 이용한 NHPP 베이지안 소프트웨어 신뢰성 모형에 관한 연구)

  • Kim, Hee-Cheul;Shin, Hyun-Cheul
    • Convergence Security Journal
    • /
    • v.6 no.3
    • /
    • pp.117-126
    • /
    • 2006
  • Bayesian inference and model selection method for software reliability growth models are studied. Software reliability growth models are used in testing stages of software development to model the error content and time intervals between software failures. In this paper, could avoid multiple integration using Gibbs sampling, which is a kind of Markov Chain Monte Carlo method to compute the posterior distribution. Bayesian inference for general order statistics models in software reliability with diffuse prior information and model selection method are studied. For model determination and selection, explored goodness of fit (the error sum of squares), trend tests. The methodology developed in this paper is exemplified with a software reliability random data set introduced by of Weibull distribution(shape 2 & scale 5) of Minitab (version 14) statistical package.

  • PDF

A Sparse Data Preprocessing Using Support Vector Regression (Support Vector Regression을 이용한 희소 데이터의 전처리)

  • Jun, Sung-Hae;Park, Jung-Eun;Oh, Kyung-Whan
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.6
    • /
    • pp.789-792
    • /
    • 2004
  • In various fields as web mining, bioinformatics, statistical data analysis, and so forth, very diversely missing values are found. These values make training data to be sparse. Largely, the missing values are replaced by predicted values using mean and mode. We can used the advanced missing value imputation methods as conditional mean, tree method, and Markov Chain Monte Carlo algorithm. But general imputation models have the property that their predictive accuracy is decreased according to increase the ratio of missing in training data. Moreover the number of available imputations is limited by increasing missing ratio. To settle this problem, we proposed statistical learning theory to preprocess for missing values. Our statistical learning theory is the support vector regression by Vapnik. The proposed method can be applied to sparsely training data. We verified the performance of our model using the data sets from UCI machine learning repository.

A Review on the Analysis of Life Data Based on Bayesian Method: 2000~2016 (베이지안 기법에 기반한 수명자료 분석에 관한 문헌 연구: 2000~2016)

  • Won, Dong-Yeon;Lim, Jun Hyoung;Sim, Hyun Su;Sung, Si-il;Lim, Heonsang;Kim, Yong Soo
    • Journal of Applied Reliability
    • /
    • v.17 no.3
    • /
    • pp.213-223
    • /
    • 2017
  • Purpose: The purpose of this study is to arrange the life data analysis literatures based on the Bayesian method quantitatively and provide it as tables. Methods: The Bayesian method produces a more accurate estimates of other traditional methods in a small sample size, and it requires specific algorithm and prior information. Based on these three characteristics of the Bayesian method, the criteria for classifying the literature were taken into account. Results: In many studies, there are comparisons of estimation methods for the Bayesian method and maximum likelihood estimation (MLE), and sample size was greater than 10 and not more than 25. In probability distributions, a variety of distributions were found in addition to the distributions of Weibull commonly used in life data analysis, and MCMC and Lindley's Approximation were used evenly. Finally, Gamma, Uniform, Jeffrey and extension of Jeffrey distributions were evenly used as prior information. Conclusion: To verify the characteristics of the Bayesian method which are more superior to other methods in a smaller sample size, studies in less than 10 samples should be carried out. Also, comparative study is required by various distributions, thereby providing guidelines necessary.

Sequential Bayesian Updating Module of Input Parameter Distributions for More Reliable Probabilistic Safety Assessment of HLW Radioactive Repository (고준위 방사성 폐기물 처분장 확률론적 안전성평가 신뢰도 제고를 위한 입력 파라미터 연속 베이지안 업데이팅 모듈 개발)

  • Lee, Youn-Myoung;Cho, Dong-Keun
    • Journal of Nuclear Fuel Cycle and Waste Technology(JNFCWT)
    • /
    • v.18 no.2
    • /
    • pp.179-194
    • /
    • 2020
  • A Bayesian approach was introduced to improve the belief of prior distributions of input parameters for the probabilistic safety assessment of radioactive waste repository. A GoldSim-based module was developed using the Markov chain Monte Carlo algorithm and implemented through GSTSPA (GoldSim Total System Performance Assessment), a GoldSim template for generic/site-specific safety assessment of the radioactive repository system. In this study, sequential Bayesian updating of prior distributions was comprehensively explained and used as a basis to conduct a reliable safety assessment of the repository. The prior distribution to three sequential posterior distributions for several selected parameters associated with nuclide transport in the fractured rock medium was updated with assumed likelihood functions. The process was demonstrated through a probabilistic safety assessment of the conceptual repository for illustrative purposes. Through this study, it was shown that insufficient observed data could enhance the belief of prior distributions for input parameter values commonly available, which are usually uncertain. This is particularly applicable for nuclide behavior in and around the repository system, which typically exhibited a long time span and wide modeling domain.