• Title/Summary/Keyword: EM알고리즘

Search Result 237, Processing Time 0.022 seconds

Variable Selection in Normal Mixture Model Based Clustering under Heteroscedasticity (이분산 상황 하에서 정규혼합모형 기반 군집분석의 변수선택)

  • Kim, Seung-Gu
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.6
    • /
    • pp.1213-1224
    • /
    • 2011
  • In high dimensionality where the number of variables are excessively larger than observations, it is required to remove the noninformative variables to cluster observations. Most model-based approaches for variable selection have been considered under the assumption of homoscedasticity and their models are mainly estimated by a penalized likelihood method. In this paper, a different approach is proposed to remove the noninformative variables effectively and to cluster based on the modified normal mixture model simultaneously. The validity of the model was provided and an EM algorithm was derived to estimate the parameters. Simulation studies and an experiment using real microarray dataset showed the effectiveness of the proposed method.

Identification of Cluster with Composite Mean and Variance (합성된 평균과 분산을 가진 군집 식별)

  • Kim, Seung-Gu
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.3
    • /
    • pp.391-401
    • /
    • 2011
  • Consider a cluster, so called a 'son cluster', whose mean and variance is composed of the means and variances of both clusters called as a 'father cluster' and a 'mother cluster'. In this paper, a method for identifying each of three clusters is provided by modeling the relationship with father and mother clusters. Under the normal mixture model, the parameters are estimated via EM algorithm. We were able to overcome the problems of estimation using ECM approximation. Numerical examples show that our method can effectively identify the three clusters, so called a 'family of clusters'.

Separating Signals and Noises Using Mixture Model and Multiple Testing (혼합모델 및 다중 가설 검정을 이용한 신호와 잡음의 분류)

  • Park, Hae-Sang;Yoo, Si-Won;Jun, Chi-Hyuck
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.4
    • /
    • pp.759-770
    • /
    • 2009
  • A problem of separating signals from noises is considered, when they are randomly mixed in the observation. It is assumed that the noise follows a Gaussian distribution and the signal follows a Gamma distribution, thus the underlying distribution of an observation will be a mixture of Gaussian and Gamma distributions. The parameters of the mixture model will be estimated from the EM algorithm. Then the signals and noises will be classified by a fixed threshold approach based on multiple testing using positive false discovery rate and Bayes error. The proposed method is applied to a real optical emission spectroscopy data for the quantitative analysis of inclusions. A simulation is carried out to compare the performance with the existing method using 3 sigma rule.

Detecting Influential Observations in Multivariate Statistical Analysis of Incomplete Data by PCA (주성분분석에 의한 결손 자료의 영향값 검출에 대한 연구)

  • 김현정;문승호;신재경
    • The Korean Journal of Applied Statistics
    • /
    • v.13 no.2
    • /
    • pp.383-392
    • /
    • 2000
  • Since late 1970, methods of influence or sensitivity analysis for detecting influential observations have been studied not only in regression and related methods but also in various multivariate methods. If results of multivariate analyses sometimes depend heavily on a small number of observations, we should be very careful to draw a conclusion. Similar phenomena may also occur in the case of incomplete data. In this research we try to study such influential observations in multivariate statistical analysis of incomplete data. Case of principal component analysis is studied with a numerical example.

  • PDF

Adaptive Threshold Detection Using Expectation-Maximization Algorithm for Multi-Level Holographic Data Storage (멀티레벨 홀로그래픽 저장장치를 위한 적응 EM 알고리즘)

  • Kim, Jinyoung;Lee, Jaejin
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37A no.10
    • /
    • pp.809-814
    • /
    • 2012
  • We propose an adaptive threshold detector algorithm for multi-level holographic data storage based on the expectation-maximization (EM) method. In this paper, the signal intensities that are passed through the four-level holographic channel are modeled as a four Gaussian mixture with unknown DC offsets and the threshold levels are estimated based on the maximum likelihood criterion. We compare the bit error rate (BER) performance of the proposed algorithm with the non-adaptive threshold detection algorithm for various levels of DC offset and misalignments. Our proposed algorithm shows consistently acceptable performance when the DC offset variance is fixed or the misalignments are lower than 20%. When the DC offset varies with each page, the BER of the proposed method is acceptable when the misalignments are lower than 10% and DC offset variance is 0.001.

Introduction of EMS AGC Group Control (EMS AGC 그룹제어 알고리즘 소개)

  • Choi, Young-Min;Lee, Gun-Woong
    • Proceedings of the KIEE Conference
    • /
    • 2008.11a
    • /
    • pp.133-135
    • /
    • 2008
  • 최근 들어 전력 계통은 점차 복잡해지고 계통의 규모 역시 빠른 속도로 성장하고 있다. 한국전력거래소는 전력계통의 안정적, 경제적 운영을 담당하고 있는 기관으로 '01년 현재의 에너지관리시스템(EMS)를 도입하여 실시간 전력계통에 대한 정확한 판단을 기반으로 전력계통의 안정성과 경제성 확보에 주력하고 있다. EMS의 대표적인 기능이라 할 수 있는 AGC(Automatic Generation Control)은 실시간으로 변화하는 전력수요를 맞추기 위해 전력 계통에 병입된 AGC 제어 대상 발전기의 출력을 가장 경제적이며 안정적으로 조정하는 것을 담당한다. 이 때 전력수요와 발전기 출력의 차이를 나타내는 것이 주파수인데 현재 주파수와 정규 주파수(60Hz)의 편차를 줄이기 위해 개별 발전기의 특성인 분당 증감발율을 사용하여 개별 발전기에 제어량을 배분하게 된다. 본고에서는 먼저 현재 운영중인 EMS에 구현된 알고리즘을 소개하고 기존 알고리즘의 개선방안인 그룹제어 방식에 대해 소개하고자 한다. 그룹제어 방식은 여러 대의 발전기를 특정 그룹으로 정해 제어신호를 최소화하는 기법으로 과도한 제어신호로 인한 발전기의 피로도를 저감하고 전력계통의 요동을 방지할 수 있을 것으로 기대한다.

  • PDF

Decision of Gaussian Function Threshold for Image Segmentation (영상분할을 위한 혼합 가우시안 함수 임계 값 결정)

  • Jung, Yong-Gyu;Choi, Gyoo-Seok;Heo, Go-Eun
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.9 no.5
    • /
    • pp.163-168
    • /
    • 2009
  • Most image segmentation methods are to represent observed feature vectors at each pixel, which are assumed as appropriated probability models. These models can be used by statistical estimating or likelihood clustering algorithms of feature vectors. EM algorithms have some calculation problems of maximum likelihood for unknown parameters from incomplete data and maximum value in post probability distribution. First, the performance is dependent upon starting positions and likelihood functions are converged on local maximum values. To solve these problems, we mixed the Gausian function and histogram at all the level values at the image, which are proposed most suitable image segmentation methods. This proposed algoritms are confirmed to classify most edges clearly and variously, which are implemented to MFC programs.

  • PDF

Estimation from Incomplete Data in Multivariate Distributions under Stochastic Ordering (확률적 순서를 갖는 다변량분포에서 불완전자료에 의한 추정)

  • Kwang Mo Jeoung
    • The Korean Journal of Applied Statistics
    • /
    • v.7 no.2
    • /
    • pp.145-157
    • /
    • 1994
  • For multivariate distributions satisfying stochastic ordering, we suggest maximum likelihood estimation with incomplete data via an EM algorithm. In this paper we restrict our attention to the contingency tables with partially cross-classified observations. We may use the existing isotonic regression program to implement EM algorithm, and we illustrate the estimation process through an example.

  • PDF

The Reanalysis of the Donation Data Using the Zero-Inflated Possion Regression (0이 팽창된 포아송 회귀모형을 이용한 기부회수 자료의 재분석)

  • Kim, In-Young;Park, Tae-Kyu;Kim, Byung-Soo
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.4
    • /
    • pp.819-827
    • /
    • 2009
  • Kim et al. (2006) analyzed the donation data surveyed by Voluneteer 21 in year 2002 at South Korea using a Poisson regression based on the mixture of two Poissons and detected significant variables for affecting the number of donations. However, noting the large deviation between the predicted and the actual frequencies of zero, we developed in this note a Poisson regression model based on a distribution in which zero inflated Poisson was added to the mixture of two Poissons. Thus the population distribution is now a mixture of three Poissons in which one component is concentrated on zero mass. We used the EM algorithm for estimating the regression parameters and detected the same variables with Kim et al's for significantly affecting the response. However, we could estimate the proportion of the fixed zero group to be 0.201, which was the characteristic of this model. We also noted that among two significant variables, the income and the volunteer experience(yes, no), the second variable could be utilized as a strategric variable for promoting the donation.

Online Learning of Bayesian Network Parameters for Incomplete Data of Real World (현실 세계의 불완전한 데이타를 위한 베이지안 네트워크 파라메터의 온라인 학습)

  • Lim, Sung-Soo;Cho, Sung-Bae
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.12
    • /
    • pp.885-893
    • /
    • 2006
  • The Bayesian network(BN) has emerged in recent years as a powerful technique for handling uncertainty iii complex domains. Parameter learning of BN to find the most proper network from given data set has been investigated to decrease the time and effort for designing BN. Off-line learning needs much time and effort to gather the enough data and since there are uncertainties in real world, it is hard to get the complete data. In this paper, we propose an online learning method of Bayesian network parameters from incomplete data. It provides higher flexibility through learning from incomplete data and higher adaptability on environments through online learning. The results of comparison with Voting EM algorithm proposed by Cohen at el. confirm that the proposed method has the same performance in complete data set and higher performance in incomplete data set, comparing with Voting EM algorithm.