• 제목/요약/키워드: EM Algorithm

검색결과 377건 처리시간 0.024초

여름강수량의 단기예측을 위한 Multi-Ensemble GCMs 기반 시공간적 Downscaling 기법 개발 (Development of Multi-Ensemble GCMs Based Spatio-Temporal Downscaling Scheme for Short-term Prediction)

  • 권현한;민영미
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2009년도 학술발표회 초록집
    • /
    • pp.1142-1146
    • /
    • 2009
  • A rainfall simulation and forecasting technique that can generate daily rainfall sequences conditional on multi-model ensemble GCMs is developed and applied to data in Korea for the major rainy season. The GCM forecasts are provided by APEC climate center. A Weather State Based Downscaling Model (WSDM) is used to map teleconnections from ocean-atmosphere data or key state variables from numerical integrations of Ocean-Atmosphere General Circulation Models to simulate daily sequences at multiple rain gauges. The method presented is general and is applied to the wet season which is JJA(June-July-August) data in Korea. The sequences of weather states identified by the EM algorithm are shown to correspond to dominant synoptic-scale features of rainfall generating mechanisms. Application of the methodology to seasonal rainfall forecasts using empirical teleconnections and GCM derived climate forecast are discussed.

  • PDF

Reject Inference of Incomplete Data Using a Normal Mixture Model

  • Song, Ju-Won
    • 응용통계연구
    • /
    • 제24권2호
    • /
    • pp.425-433
    • /
    • 2011
  • Reject inference in credit scoring is a statistical approach to adjust for nonrandom sample bias due to rejected applicants. Function estimation approaches are based on the assumption that rejected applicants are not necessary to be included in the estimation, when the missing data mechanism is missing at random. On the other hand, the density estimation approach by using mixture models indicates that reject inference should include rejected applicants in the model. When mixture models are chosen for reject inference, it is often assumed that data follow a normal distribution. If data include missing values, an application of the normal mixture model to fully observed cases may cause another sample bias due to missing values. We extend reject inference by a multivariate normal mixture model to handle incomplete characteristic variables. A simulation study shows that inclusion of incomplete characteristic variables outperforms the function estimation approaches.

Partitioning likelihood method in the analysis of non-monotone missing data

  • Kim Jae-Kwang
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2004년도 학술발표논문집
    • /
    • pp.1-8
    • /
    • 2004
  • We address the problem of parameter estimation in multivariate distributions under ignorable non-monotone missing data. The factoring likelihood method for monotone missing data, termed by Robin (1974), is extended to a more general case of non-monotone missing data. The proposed method is algebraically equivalent to the Newton-Raphson method for the observed likelihood, but avoids the burden of computing the first and the second partial derivatives of the observed likelihood Instead, the maximum likelihood estimates and their information matrices for each partition of the data set are computed separately and combined naturally using the generalized least squares method. A numerical example is also presented to illustrate the method.

  • PDF

변분 근사화 분포의 유도 및 변분 베이지안 가우시안 혼합 모델의 구현 (Implementation of Variational Bayes for Gaussian Mixture Models and Derivation of Factorial Variational Approximation)

  • 이기성
    • 한국산학기술학회논문지
    • /
    • 제9권5호
    • /
    • pp.1249-1254
    • /
    • 2008
  • 그래프 모델에서 가장 중요한 부분은 관찰 데이터가 주어진 상황에서 은닉 변수와 더불어 파라미터의 사후확률 분포의 계산이다. 이 논문에서는 가우시안 혼합 모델에 대한 변분 베이지안 방법의 구현과 변분 근사화 분포의 분해 유도를 제안한다. 이 방법은 정보 검색이나 데이터 시각화와 같은 데이터 분석 등에 적용이 가능하다.

Semiparametric mixture of experts with unspecified gate network

  • Jung, Dahai;Seo, Byungtae
    • Journal of the Korean Data and Information Science Society
    • /
    • 제28권3호
    • /
    • pp.685-695
    • /
    • 2017
  • The traditional mixture of experts (ME) modeled the gate network using a certain parametric function. However, if the assumed parametric function does not properly reflect the true nature, the prediction strength of ME would become weak. For example, the parametric ME often uses logistic or multinomial logistic models for the network model. However, this could be very misleading if the true nature of the data is quite different from those models. Although, in this case, we may develop more flexible parametric models by extending the model at hand, we will never be free from such misspecification problems. In order to alleviate such weakness of the parametric ME, we propose to use the semi-parametric mixture of experts (SME) in which the gate network is estimated in a non-parametrical way. Based on this, we compared the performance of the SME with those of ME and neural networks via several simulation experiments and real data examples.

A Finite Mixture Model for Gene Expression and Methylation Pro les in a Bayesian Framewor

  • Jeong, Jae-Sik
    • 응용통계연구
    • /
    • 제24권4호
    • /
    • pp.609-622
    • /
    • 2011
  • The pattern of methylation draws significant attention from cancer researchers because it is believed that DNA methylation and gene expression have a causal relationship. As the interest in the role of methylation patterns in cancer studies (especially drug resistant cancers) increases, many studies have been done investigating the association between gene expression and methylation. However, a model-based approach is still in urgent need. We developed a finite mixture model in the Bayesian framework to find a possible relationship between gene expression and methylation. For inference, we employ Expectation-Maximization(EM) algorithm to deal with latent (unobserved) variable, producing estimates of parameters in the model. Then we validated our model through simulation study and then applied the method to real data: wild type and hydroxytamoxifen(OHT) resistant MCF7 breast cancer cell lines.

Statistical Analysis of Bivariate Recurrent Event Data with Incomplete Observation Gaps

  • Kim, Yang-Jin
    • Communications for Statistical Applications and Methods
    • /
    • 제20권4호
    • /
    • pp.283-290
    • /
    • 2013
  • Subjects can experience two types of recurrent events in a longitudinal study. In addition, there may exist intermittent dropouts that results in repeated observation gaps during which no recurrent events are observed. Therefore, theses periods are regarded as non-risk status. In this paper, we consider a special case where information on the observation gap is incomplete, that is, the termination time of observation gap is not available while the starting time is known. For a statistical inference, incomplete termination time is incorporated in terms of interval-censored data and estimated with two approaches. A shared frailty effect is also employed for the association between two recurrent events. An EM algorithm is applied to recover unknown termination times as well as frailty effect. We apply the suggested method to young drivers' convictions data with several suspensions.

Noisy Speech Recognition Based on Noise-Adapted HMMs Using Speech Feature Compensation

  • Chung, Yong-Joo
    • 융합신호처리학회논문지
    • /
    • 제15권2호
    • /
    • pp.37-41
    • /
    • 2014
  • The vector Taylor series (VTS) based method usually employs clean speech Hidden Markov Models (HMMs) when compensating speech feature vectors or adapting the parameters of trained HMMs. It is well-known that noisy speech HMMs trained by the Multi-condition TRaining (MTR) and the Multi-Model-based Speech Recognition framework (MMSR) method perform better than the clean speech HMM in noisy speech recognition. In this paper, we propose a method to use the noise-adapted HMMs in the VTS-based speech feature compensation method. We derived a novel mathematical relation between the train and the test noisy speech feature vector in the log-spectrum domain and the VTS is used to estimate the statistics of the test noisy speech. An iterative EM algorithm is used to estimate train noisy speech from the test noisy speech along with noise parameters. The proposed method was applied to the noise-adapted HMMs trained by the MTR and MMSR and could reduce the relative word error rate significantly in the noisy speech recognition experiments on the Aurora 2 database.

화자 식별을 위한 GMM의 혼합 성분의 개수 추정 (Estimation of Mixture Numbers of GMM for Speaker Identification)

  • 이윤정;이기용
    • 음성과학
    • /
    • 제11권2호
    • /
    • pp.237-245
    • /
    • 2004
  • In general, Gaussian mixture model(GMM) is used to estimate the speaker model for speaker identification. The parameter estimates of the GMM are obtained by using the expectation-maximization (EM) algorithm for the maximum likelihood(ML) estimation. However, if the number of mixtures isn't defined well in the GMM, those parameters are obtained inappropriately. The problem to find the number of components is significant to estimate the optimal parameter in mixture model. In this paper, to estimate the optimal number of mixtures, we propose the method that starts from the sufficient mixtures, after, the number is reduced by investigating the mutual information between mixtures for GMM. In result, we can estimate the optimal number of mixtures. The effectiveness of the proposed method is shown by the experiment using artificial data. Also, we performed the speaker identification applying the proposed method comparing with other approaches.

  • PDF

Bayesian approach for categorical Table with Nonignorable Nonresponse

  • Choi, Bo-Seung;Park, You-Sung
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2005년도 추계 학술발표회 논문집
    • /
    • pp.59-65
    • /
    • 2005
  • We propose five Bayesian methods to estimate the cell expectation in an incomplete multi-way categorical table with nonignorable nonresponse mechanism. We study 3 Bayesian methods which were previously applied to one-way categorical tables. We extend them to multi-way tables and, in addition, develop 2 new Bayesian methods for multi-way categorical tables. These five methods are distinguished by different priors on the cell probabilities: two of them have the priors determined only by information of respondents; one has a constant prior; and the remaining two have priors reflecting the difference in the response mechanisms between respondent and non-respondent. We also compare the five Bayesian methods using a categorical data for a prospective study of pregnant women.

  • PDF