• Title/Summary/Keyword: Hierarchical Bayesian inference

Search Result 26, Processing Time 0.02 seconds

Bayesian inference in finite population sampling under measurement error model

  • Goo, You Mee;Kim, Dal Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.6
    • /
    • pp.1241-1247
    • /
    • 2012
  • The paper considers empirical Bayes (EB) and hierarchical Bayes (HB) predictors of the finite population mean under a linear regression model with measurement errors We discuss how to calculate the mean squared prediction errors of the EB predictors using jackknife methods and the posterior standard deviations of the HB predictors based on the Markov Chain Monte Carlo methods. A simulation study is provided to illustrate the results of the preceding sections and compare the performances of the proposed procedures.

Sampling Based Approach for Combining Results from Binomial Experiments

  • Cho, Jang-Sik;Kim, Dal-Ho;Kang, Sang-Gil
    • Journal of the Korean Data and Information Science Society
    • /
    • v.12 no.1
    • /
    • pp.1-9
    • /
    • 2001
  • In this paper, the problem of information related to I binomial experiments, each having a distinct probability of success ${\theta}_i$, i = 1,2, $\cdots$, I, is considered. Instead of using a standard exchangeable prior for ${\theta}\;=\;({\theta}_1,\;{\theta}_2,\;{\cdots},\;{\theta}_I)$, we con-sider a partition of the experiments and take the ${\theta}_i$'s belonging to the same partition subset to be exchangeable and the ${\theta}_i$'s belonging to distinct subsets to be independent. And we perform Gibbs sampler approach for Bayesian inference on $\theta$ conditional on a partition. Also we illustrate the methodology with a real data.

  • PDF

Simultaneous modeling of mean and variance in small area estimation

  • Kim, Myungjin;Kim, Dal Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.5
    • /
    • pp.1423-1431
    • /
    • 2016
  • When the sample size in a certain domain is too small to produce adequate information, small area model with random effects is usually used. Also, if we do not consider an inherent pattern which data possess, it considerably affects inference. In this paper, we mainly focus on modeling to handle increased variation of the Current Population Survey (CPS) median income as the Internal Revenue Service (IRS) mean income increases. In a hierarchical Bayesian framework, most estimations are carried out through the Gibbs sampler while the grid method is used to generate parameters from non-standard form. Numerical study indicates that the performance of proposed model is better than that of CPS method in terms of four comparison measurements.

Genetic Diversity and Population Genetic Structure of Exochorda serratifolia in South Korea (가침박달 집단의 유전다양성 및 유전구조 분석)

  • Hong, Kyung Nak;Lee, Jei Wan;Kang, Jin Taek
    • Journal of Korean Society of Forest Science
    • /
    • v.102 no.1
    • /
    • pp.122-128
    • /
    • 2013
  • Genetic diversity and population genetic structure were estimated in nine natural populations of Exochorda serratifolia in South Korea using ISSR marker system. Average of polymorphic loci per primer was 5.8 (S.D.=2.32) and percentage of polymorphic loci per population was 78.7% with total 35 loci from 6 ISSR primers. In AMOVA, 27.8% of total genetic variation came from genetic difference among populations and 72.2% was resulted from difference among individual trees within populations. Genetic differentiations by Bayesian inference were 0.249 of ${\theta}^{11}$ and 0.227 of $G_{ST}$. Inbreeding coefficient for total populations was 0.412. There was significant correlation between genetic distance and geographic distance among populations. On the results of Bayesian cluster analysis, nine populations were assigned into three groups. The first group included 5 populations, and the second and the third had two populations per group, respectively. These three regions could explain 10.0% of total genetic variation from hierarchical AMOVA, and the levels of among-population and among-individual were explained 19.7% and 70.3%, respectively. The geographic distribution of populations following the three Bayesian clusters could be explained with mountain range as Baekdudaegan which is the main chain of mountains in Korea. The mountains as the physical barrier might hamper gene flow in the pearlbush. So when protected areas are designated for conservation of this species, we should consider those three regions into considerations and would better to choose at least one population per region.

Identifying Copy Number Variants under Selection in Geographically Structured Populations Based on F-statistics

  • Song, Hae-Hiang;Hu, Hae-Jin;Seok, In-Hae;Chung, Yeun-Jun
    • Genomics & Informatics
    • /
    • v.10 no.2
    • /
    • pp.81-87
    • /
    • 2012
  • Large-scale copy number variants (CNVs) in the human provide the raw material for delineating population differences, as natural selection may have affected at least some of the CNVs thus far discovered. Although the examination of relatively large numbers of specific ethnic groups has recently started in regard to inter-ethnic group differences in CNVs, identifying and understanding particular instances of natural selection have not been performed. The traditional $F_{ST}$ measure, obtained from differences in allele frequencies between populations, has been used to identify CNVs loci subject to geographically varying selection. Here, we review advances and the application of multinomial-Dirichlet likelihood methods of inference for identifying genome regions that have been subject to natural selection with the $F_{ST}$ estimates. The contents of presentation are not new; however, this review clarifies how the application of the methods to CNV data, which remains largely unexplored, is possible. A hierarchical Bayesian method, which is implemented via Markov Chain Monte Carlo, estimates locus-specific $F_{ST}$ and can identify outlying CNVs loci with large values of FST. By applying this Bayesian method to the publicly available CNV data, we identified the CNV loci that show signals of natural selection, which may elucidate the genetic basis of human disease and diversity.

Nonstandard Machine Learning Algorithms for Microarray Data Mining

  • Zhang, Byoung-Tak
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2001.10a
    • /
    • pp.165-196
    • /
    • 2001
  • DNA chip 또는 microarray는 다수의 유전자 또는 유전자 조각을 (보통 수천내지 수만 개)칩상에 고정시켜 놓고 DNA hybridization 반응을 이용하여 유전자들의 발현 양상을 분석할 수 있는 기술이다. 이러한 high-throughput기술은 예전에는 생각하지 못했던 여러가지 분자생물학의 문제에 대한 해답을 제시해 줄 수 있을 뿐 만 아니라, 분자수준에서의 질병 진단, 신약 개발, 환경 오염 문제의 해결 등 그 응용 가능성이 무한하다. 이 기술의 실용적인 적용을 위해서는 DNA chip을 제작하기 위한 하드웨어/웻웨어 기술 외에도 이러한 데이터로부터 최대한 유용하고 새로운 지식을 창출하기 위한 bioinformatics 기술이 핵심이라고 할 수 있다. 유전자 발현 패턴을 데이터마이닝하는 문제는 크게 clustering, classification, dependency analysis로 구분할 수 있으며 이러한 기술은 통계학과인공지능 기계학습에 기반을 두고 있다. 주로 사용된 기법으로는 principal component analysis, hierarchical clustering, k-means, self-organizing maps, decision trees, multilayer perceptron neural networks, association rules 등이다. 본 세미나에서는 이러한 기본적인 기계학습 기술 외에 최근에 연구되고 있는 새로운 학습 기술로서 probabilistic graphical model (PGM)을 소개하고 이를 DNA chip 데이터 분석에 응용하는 연구를 살펴본다. PGM은 인공신경망, 그래프 이론, 확률 이론이 결합되어 형성된 기계학습 모델로서 인간 두뇌의 기억과 학습 기작에 기반을 두고 있으며 다른 기계학습 모델과의 큰 차이점 중의 하나는 generative model이라는 것이다. 즉 일단 모델이 만들어지면 이것으로부터 새로운 데이터를 생성할 수 있는 능력이 있어서, 만들어진 모델을 검증하고 이로부터 새로운 사실을 추론해 낼 수 있어 biological data mining 문제에서와 같이 새로운 지식을 발견하는 exploratory analysis에 적합하다. 또한probabilistic graphical model은 기존의 신경망 모델과는 달리 deterministic한의사결정이 아니라 확률에 기반한 soft inference를 하고 학습된 모델로부터 관련된 요인들간의 인과관계(causal relationship) 또는 상호의존관계(dependency)를 분석하기에 적합한 장점이 있다. 군체적인 PGM 모델의 예로서, Bayesian network, nonnegative matrix factorization (NMF), generative topographic mapping (GTM)의 구조와 학습 및 추론알고리즘을소개하고 이를 DNA칩 데이터 분석 평가 대회인 CAMDA-2000과 CAMDA-2001에서 사용된cancer diagnosis 문제와 gene-drug dependency analysis 문제에 적용한 결과를 살펴본다.

  • PDF